I don't like NumPy

488 MinimalAction 206 5/15/2025, 4:05:22 PM dynomight.net ↗

Comments (206)

WCSTombs · 49d ago

If your arrays have more than two dimensions, please consider using Xarray [1], which adds dimension naming to NumPy arrays. Broadcasting and alignment then becomes automatic without needing to transpose, add dummy axes, or anything like that. I believe that alone solves most of the complaints in the article.

Compared to NumPy, Xarray is a little thin in certain areas like linear algebra, but since it's very easy to drop back to NumPy from Xarray, what I've done in the past is add little helper functions for any specific NumPy stuff I need that isn't already included, so I only need to understand the NumPy version of the API well enough one time to write that helper function and its tests. (To be clear, though, the majority of NumPy ufuncs are supported out of the box.)

I'll finish by saying, to contrast with the author, I don't dislike NumPy, but I do find its API and data model to be insufficient for truly multidimensional data. For me three dimensions is the threshold where using Xarray pays off.

[1] https://xarray.dev

renjimen · 49d ago

Xarray is great. It marries the best of Pandas with Numpy.

Indexing like `da.sel(x=some_x).isel(t=-1).mean(["y", "z"])` makes code so easy to write and understand.

Broadcasting is never ambiguous because dimension names are respected.

It's very good for geospatial data, allowing you to work in multiple CRSs with the same underlying data.

We also use it a lot for Bayesian modeling via Arviz [1], since it makes the extra dimensions you get from sampling your posterior easy to handle.

Finally, you can wrap many arrays into datasets, with common coordinates shared across the arrays. This allows you to select `ds.isel(t=-1)` across every array that has a time dimension.

[1] https://www.arviz.org/en/latest/

mgunyho · 49d ago

Seconded. Xarray has mostly replaced bare NumPy for me and it makes me so much more productive.

ddtaylor · 49d ago

Is there anything similar to this for something like Tensorflow, Keras or Pytorch? I haven't used them super recently, but in the past I needed to do all of the things you just described in painful to debug ways.

mgunyho · 49d ago

For Torch, I have come across Named Tensors, which should work in a similar way: https://docs.pytorch.org/docs/stable/named_tensor.html

The docs say that it's a prototype feature, and I think it has been that way for a few years now, so no idea how production-ready it is.

xiphias2 · 49d ago

It's a much worse API than Xarrays, it seems like somebody should build it on top of PyTorch.

SimplyUnknown · 49d ago

I really like einops. This works for numpy, pytorch and keras/tensorflow and has easy named transpose, repeat, and eimsum operations.

rsfern · 48d ago

Same - I’ve been using einops and jaxtyping together pretty extensively recently and it helps a lot for reading/writing multidimensional array code. Also array_api_compat, the API coverage isn’t perfect but it’s pretty satisfying to write code that works for both PyTorch and numpy arrays

https://docs.kidger.site/jaxtyping/

https://data-apis.org/array-api-compat/

toth · 49d ago

For pytorch the analogue is Named Tensors, but it's a provisional feature and not supported everywhere.

https://docs.pytorch.org/docs/stable/named_tensor.html

michaelbarton · 49d ago

Thanks for sharing this library. I will give it a try.

For a while I had a feeling that I was perhaps a little crazy for seeming to be only person to really dislike the use of things like ‘array[:, :, None]’ and so forth.

insane_dreamer · 49d ago

along those lines, for biosignal processing, NeuroPype[0] also builds on numpy and implements named axes for n-dimensional tensors, with the ability to store per-element data (i.e. channel names, positions, etc.) for each axis

[0] https://www.neuropype.io/docs/

fsndz · 49d ago

xarray is nice

inasio · 49d ago

Life goes full circle sometimes. I remember that Numpy roughly came out of the amalgamation of the Numeric and Numarray libraries; I want to imagine that the Numarray people kept fighting these past 20 years to prove they were the right solution, at some point took some money from Elon Musk and renamed to Xarray [0], and finally started beating Numpy.

[0] most of the above is fiction

No comments yet

ChrisRackauckas · 49d ago

One of the reasons why I started using Julia was because the NumPy syntax was so difficult. Going from MATLAB to NumPy I felt like I suddenly became a mediocre programmer, spending less time on math and more time on "performance engineering" of just trying to figure out how to use NumPy right. Then when I went to Julia it made sense to vectorize when it felt good and write a loop when it felt good. Because both are fast, focus on what makes the code easiest to read an understand. This blog post encapsulates exactly that experience and feeling.

Also treating things like `np.linalg.solve` as a black box that is the fastest thing in the world and you could never any better so please mangle code to call it correctly... that's just wrong. There's many reasons to build problem specific linear algebra kernels, and that's something that is inaccessible without going deeper. But that's a different story.

abdullahkhalids · 49d ago

The reason is quite simple. Julia is a language designed for scientific computation. Numpy is a library frankenstein-grafted onto a language that isn't really designed for scientific computation.

We can only hope that Julia somehow wins and those of forced to work in python because of network effects can be freed.

waffletower · 48d ago

Julia really needs to generalize and clarify its deployment story before it could possibly take off. It was built with a promise of generality but its tethered packaging is an albatross.

lenkite · 48d ago

Nah, because "Python is second-best at everything" and has more libraries than the Galactic Federation, that will never happen.

davedx · 48d ago

This has me wondering, if not that, then what is python designed for?

xypage · 48d ago

To be general purpose with a very large package ecosystem so that you can get just about anything started pretty quickly. It is relatively easy to do things that aren't performance critical with python, which is great if you want an MVP to grow off of or if you're messing around with something and want to make a little flask server for it, or maybe run some image recognition, or a little anything. It's really just insanely flexible and puts a lot of the cognitive load in libraries so you can get straight to doing stuff, again, often at the cost of performance.

lolc · 47d ago

The best description I heard of Python is "glue for C libs".

derriz · 46d ago

Originally a scripting language - from what I recall. An alternative to Perl or Tcl.

glial · 48d ago

I've been hoping this for 15 years, but so far I still use Python for everything.

jampekka · 49d ago

> Going from MATLAB to NumPy I felt like I suddenly became a mediocre programmer, spending less time on math and more time on "performance engineering" of just trying to figure out how to use NumPy right.

Matlab is about as slow without readily vectorized operations as Python.

Slowness of Python is a huge pain point, and Julia has a clear advantage here. Sadly Julia is practically just unusable beyond quite niche purposes.

Python does now have quite serviceable jit hacks that let one escape vectorization tricks, but they are still hacks and performant Python alternative would be very welcome. Sadly there aren't any.

AnotherGoodName · 49d ago

One of the things i suspect will happen very soon is that all languages will become as fast as each other and your reason to use one over the other for performance reasons might not exist. I know this sounds optimistic and wishful thinking but hear me out here;

Modern AIs are literal translation engines. That's their origin. The context that lets them do what they do was originally there to allow good translations. It doesn't matter if the translation is programming language output, or actual language output. I can today ask an AI to rewrite a chunk of Matlab code into Rust and it works! There's even code generators that will utilize the GPU where sensible.

We're really not that far off that we can write code in any language and transparently behind the scenes have it actually running on a more performant backend when needed. Ideally you keep the Matlab/Python frontend and the translation will be on the intermediate layers in a two way fashion so step through/debug works as expected.

Based on playing with the above steps manually with good results we're almost at the stage of just needing some glue to make it all work. Write in Matlab/Python, and run as fast as any LLVM backed language.

codethief · 49d ago

Sounds like transpiling Typescript to JavaScript, except that you translate to a vastly different language, thereby making you, the programmer, largely unable to reason about the performance characteristics of your code ("Can the transpiler optimize this Python loop or nah?"), and you also throw in the indeterminism of LLMs. Oooof, I'm not sure I'd want that.

AnotherGoodName · 49d ago

When you're using Python the type of performance characteristics you're usually thinking of are "can i call an external library that runs this on the GPU" or "is this log(n) or n^2 complexity?" and such language translations shouldn't change those parts.

As in the language is regularly benchmarked at 10-100x slower than other languages. It's not a languages for caring about subtle performance characteristics outside "am i calling the external library correctly or not?" (as in this article) or "is this algorithm exponential time?".

lolc · 47d ago

There are enough ambiguities between language libs, even different versions of the same lib, that transpilation will always depend on verification.

It'll work for toy examples. For nontrivial code the generated transpilation will flame out in a different way on every run.

leephillips · 48d ago

“quite niche purposes”

I don’t understand this. Julia is a general-purpose language that I increasingly use to replace bash scripts. I’ve also happily used it for web scraping tasks¹, interactive web² projects³, generation of illustrations and animations, and, yes, scientific calculations. Aside from the language itself, which is fun to program in, its brilliant package system massively facilitates composing libraries into a solution.

1 https://lee-phillips.org/amazonJuliaBookRanks

2 https://www.admin-magazine.com/Archive/2025/85

3 https://lee-phillips.org/pluckit

Zambyte · 49d ago

> Sadly Julia is practically just unusable beyond quite niche purposes.

Why? Just the ecosystem?

jampekka · 49d ago

Mainly the lack of practical support for non-REPL/Notebook usage and interoperability.

joshjob42 · 47d ago

Interop with what? It has a very good FFI for C, PyCall, RCall, etc. work quite well.

It is pretty REPL focused, but what is the thing you want that you don't have? Like better debugger support in VSCode or something?

No comments yet

Zambyte · 47d ago

Just happened to find this article, seems to do a good job of explaining why.

https://news.ycombinator.com/item?id=44013455

tkuraku · 48d ago

I've found matlabs's JIT to be about on par with numba for loops.

bandrami · 49d ago

jampekka · 49d ago

R is for sure not a Python replacement. And it's even slower than Python.

bandrami · 48d ago

R has significant performance advantages over Python if you are keeping it in its lane. But it's a DSL and not a general-purpose language.

Bostonian · 49d ago

"Then when I went to Julia it made sense to vectorize when it felt good and write a loop when it felt good. Because both are fast, focus on what makes the code easiest to read an understand."

This is true of modern Fortran also.

sundarurfriend · 49d ago

The amount of work that's been put into Fortran by the LFortran [1] folks and others in recent years [2] is nothing short of absurdly impressive!

[1] https://lfortran.org/

[2] "in recent years" because that's when I became aware of this stuff, not to say there wasn't effort before then

pklausler · 49d ago

If you’re going to give praise, include the GNU Fortran developers, who have really set the standard for open source production quality Fortran compiler development.

sundarurfriend · 49d ago

That sounds great, kudos to them too!

The only times I set foot in the Fortran world is when some Julia-related development [1] [2] happens, and then I peruse some surrounding links and related projects - so I have very little familiarity with GNU Fortran myself.

[1] https://fortran-lang.discourse.group/t/update-on-prima-a-jul... [2] https://discourse.julialang.org/t/ann-julia-back-end-for-lfo...

ChrisRackauckas · 49d ago

Indeed, I also like modern Fortran.

waffletower · 48d ago

I like drag racing too, and modern Fortran is really competitive on the track. But I think I need a car that is street legal for my needs.

amluto · 49d ago

Is MATLAB materially different? Loops are slow (how slow depends on the version), and the fastest thing in the world is the utter perfection of the black box called '\'.

moregrist · 48d ago

The last time I seriously used Matlab, over 10 years ago, they had implemented JIT compilation and often straightforward loops were a lot faster than trying to vectorize. And definitely less error-prone.

Iirc, ‘\’ was just shorthand for solving a system of equations, which you could mentally translate into the appropriate LAPACK function without loss of precision. You did have to be a little more careful about making extra copies than in Fortran or C (or even Python). But nothing was magic.

brosco · 49d ago

Compared to Matlab (and to some extent Julia), my complaints about numpy are summed up in these two paragraphs:

> Some functions have axes arguments. Some have different versions with different names. Some have Conventions. Some have Conventions and axes arguments. And some don’t provide any vectorized version at all.

> But the biggest flaw of NumPy is this: Say you create a function that solves some problem with arrays of some given shape. Now, how do you apply it to particular dimensions of some larger arrays? The answer is: You re-write your function from scratch in a much more complex way. The basic principle of programming is abstraction—solving simple problems and then using the solutions as building blocks for more complex problems. NumPy doesn’t let you do that.

Usually when I write Matlab code, the vectorized version just works, and if there are any changes needed, they're pretty minor and intuitive. With numpy I feel like I have to look up the documentation for every single function, transposing and reshaping the array into whatever shape that particular function expects. It's not very consistent.

jampekka · 49d ago

Matlab's support for more than 2 dimensions in arrays is so bad that it's rare to encounter the situations lamented in TFA.

treefarmer · 49d ago

Yeah, in case anyone else has the misfortune of having to work with multi-dimensional data in MATLAB, I'd recommend the Tensor Toolbox, Tensorlab, or the N-way Toolbox.

nycticorax · 48d ago

There are certainly some aspects of it that are inelegant, in the interests of backwards compatibility, but otherwise I don't know what you are talking about. Matlab supports >2d arrays just fine, and has for at least 20 years.

IshKebab · 49d ago

Well it's not Tenslab!

breakds · 49d ago

For the second problem, if I understand it correctly, you might want to try `vmap` from jax.

almostgotcaught · 48d ago

> Now, how do you apply it to particular dimensions of some larger arrays?

What exactly is the complaint here? If I write a function on a 2x2 array, there's no way to apply it to tiles in a 3x2x2 array? But there is - take a slice and squeeze. What's the issue?

Edit: if the issue is some weird version of "why can't I apply my function to an arbitrary set of indices" (like the 0,1 in a 2x2x2 array) I'm at a loss for words (because that's a completely different shape - indeed it's the entire array).

DrFalkyn · 49d ago

reshape

vector_spaces · 49d ago

My main issue with numpy is that it's often unclear what operations will be vectorized or how they will be vectorized, and you can't be explicit about it the way you can with Julia's dot syntax.

There are also lots of gotchas related to types returned by various functions and operations.

A particularly egregious example: for a long time, the class for univariate polynomial objects was np.poly1d. It had lots of conveniences for doing usual operations on polynomials

If you multiply a poly1d object P on the right by a complex scalar z0, you get what you probably expect: a poly1d with coefficients scaled by z0.

If however you multiply P on the left by z0, you get back an array of scaled coefficients -- there's a silent type conversion happening.

  P*z0 # gives a poly1d
  z0*P # gives an array

I know that this is due to Python associativity rules and laissez-faire approach to datatypes, but it's fairly ugly to debug something like this!

Another fun gotcha with poly1d: if you want to access the leading coefficient for a quadratic, you can do so with either P.coef[0] or P[2]. No one will ever get these confused, right?

These particular examples aren't really fair because the numpy documentation describes poly1d as a part of the "old" polynomial API, advising new code to be written with the `Polynomial` API -- although it doesn't seem it's officially deprecated and no warnings are emitted when you use poly1d.

Anyway, there are similar warts throughout the library. Lots of gotchas having the shape of silent type conversions and inconsistent datatypes returned by the same method depending on its inputs that are downright nightmarish to debug

dumah · 48d ago

This is because of the dispatch semantics for dunder operator methods, which the Python docs are horrible at communicating.

The types returned as you described are exactly what I’d expect and idiomatic for Python. This isn’t a problem with numpy.

SirHumphrey · 49d ago

The main problem (from my point of view) of python data science ecosystem is a complete lack of standardization on anything.

You have ten different libraries that try to behave like 4 other languages and the only point of standardization is that there is usually something like .to_numpy() function. This means that most of the time I was not solving any specific problem related to data processing, but just converting data from a format one library would understand to something another library would understand.

In Julia (a language with it's own problems, of course) things mostly just work. The library for calculating uncertainties interacts well with a library handling units and all this works fine with the piloting library, or libraries solving differential equations etc. In python, this required quite a lot of boilerplate.

Evidlo · 49d ago

Nobody has mentioned array-api (and data-apis more generally), which is trying to standardize the way people interact with arrays across the Python ecosystem.

https://github.com/data-apis/array-api

https://data-apis.org/blog/announcing_the_consortium/

bornfreddy · 48d ago

Sounds like a great idea, but difficult to achieve. The announcement blog post was almost 5 years ago, do you know maybe what the impact of this project has been in practice?

nickledave · 42d ago

There are yearly releases of the standard https://data-apis.org/blog/ and I often see it ref'd on issues in individual libraries (numpy, pytorch)

See also funding from CZI (on the blog)

Subjectively I do find it helps with consistency -- and when things are not consistent, it's easier to discover what's different and why

edit: but I completely agree with this post and the follow-up from the same author on "dumbpy". At least at first blush, I need to read in more detail.

HdS84 · 49d ago

R with its 4(?) class systems enters the chat.

ChrisRackauckas · 49d ago

It's at least 5 at this point.

rienbdj · 49d ago

In defense of R the class systems do have different characteristics and they are not deeply embedded in the language or anything.

blululu · 49d ago

The author brings up some fair points. I feel like I had all sorts of small grievances transitioning from Matlab to Numpy. Slicing data still feels worse on Numpy than Matlab or Julia, but this doesn't justify the licensing costs for the Matlab stats/signal processing toolbox.

The issues that are presented in this article mostly related to tensors of rank >2. Numpy is originally just matrices so it is not surprising that it has problems in this domain. A dedicated library like Torch is certainly better. But Torch is difficult in its own ways. IDK, I guess the author's conclusion that numpy is “the worst array language other than all the other array languages” feels correct. Maybe a lack of imagination on my part.

jampekka · 49d ago

Numpy was about N-dimensional arrays from the getgo. It was a continuation of numarray, which was Nd.

threeducks · 49d ago

I thought I'd do something smart and inline all the matrix multiplications into the einsums of the vectorized multi-head attention implementation from the article and set optimize="optimal" to make use of the optimal matrix chain multiplication algorithm https://en.wikipedia.org/wiki/Matrix_chain_multiplication to get a nice performance boost.

    def multi_head_attention_golfed(X, W_q, W_k, W_v, W_o, optimize="optimal"):
        scores = np.einsum('si,hij,tm,hmj->hst', X, W_q, X, W_k, optimize=optimize)
        weights = softmax(W_k.shape[-1]**-0.5 * scores, axis=-1)
        projected = np.einsum('hst,ti,hiv,hvd->shd', weights, X, W_v, W_o, optimize=optimize)
        return projected.reshape(X.shape[0], W_v.shape[2])

This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster. Here is the code if someone wants to figure out why the performance is like that: https://pastebin.com/raw/peptFyCw

My guess is that einsum could do a better job of considering cache coherency when evaluating the sum.

davedx · 48d ago

> This is indeed twice as fast as the vectorized implementation, but, disappointingly, the naive implementation with loops is even faster.

On CPU or GPU?

kccqzy · 48d ago

This is NumPy we are discussing. It doesn't use the GPU.

threeducks · 47d ago

To be fair, you could replace `import numpy as np` with `import cupy as np` and it would run on GPU without further changes. It is not any good though. PyTorch is roughly 12 times faster.

aborsy · 49d ago

My main problem with numpy is that the syntax is verbose. I know from the programming language perspective it may not be considered a drawback (might even be a strength). But in practice, the syntax is a pain compared to matlab or Julia. The code for the latter is easier to read, understand, consistent with math notation.

nis251413 · 49d ago

The syntax of actual array languages can be beautifully concise and expressive. You can express mathematical formulas in ways that makes sense when you read a single line, and once you get used to the notation and some originally non-intuitive quirks (from a programming background perspective), you can write in very few lines what otherwise would take you several lines of rather ugly, less readable code.

In my view, python+numpy is not actually an array language. Numpy as a library adds vectorization operations to python in order to help with speed. This is different. It does not (intend to) bring the advantages that array language syntax has, even if it was a bit more consistent.

throwaway314155 · 49d ago

Does numpy market itself as an array language or something?

nis251413 · 48d ago

Not that I know of, and I did not claim it does.

throwaway314155 · 48d ago

Guess I'm just curious why you think comparisons to array languages are so naturally apt, when most developers haven't even touched an array language.

nis251413 · 48d ago

I mean, the article and discussion are about numpy's syntax around vectorised code and problems people have with that. Many comments make comparisons with matlab, and I am pointing that a language allowing you do use arrays and write vectorised code, and a language being an array language are not the same thing. Writing vectorised code in a language were everything is based on arrays is in general more natural. The variable definitions are simpler (you do not specify what is array and what is not because everything is an array), and operations tend to work more consistently. Eg in matlab operations are by default done column-wise, because that's how the language is designed and works internally. So functions acting on 2+D arrays act by default column-wise, it does not depend on other context, and they are designed so in order to be faster, not merely consistent to the user. Consistency comes from how arrays are internally represented in memory and the need to have fast code, not just as an arbitrary design choice at the highest level.

Most developers do not touch array languages but I guess most developpers don't in general (need to) vectorise code this way and avoid loops (because they work in other problem domains, use lower level languages etc). If anything, not all problems can be vectorised anyway (or at least elegantly). But if one writes vectorised code, doing that in an array language makes more sense.

aborsy · 48d ago

Yeah. Compare

A = [1, 2; 3, 4];

x = [5; 6];

y = A * x;

with this uglier version:

import numpy as np

A = np.array([[1, 2], [3, 4]])

x = np.array([[5], [6]])

y = A @ x

threeducks · 48d ago

You don't have to wrap the lists in np.array if you use NumPy functions (or if one of the arguments already is a NumPy array, which usually is the case):

    from numpy import *

    A = [[1, 2], [3, 4]]
    x = [[5], [6]]
    y = dot(A, x)

nis251413 · 48d ago

That's nice, but only works, as I understand it, if you use numpy-only functions, which means that you should not use those who denote also base-pythonic, eg +,* etc operations, because then they are interpreted differently. Eg `A + x` gives

    [[1, 2], [3, 4], [5], [6]]

instead of

    array([[ 6,  7],[ 9, 10]])

You have to keep track of the context there to know what you can do and what not I guess, which is not ideal.

dima55 · 49d ago

Hear hear! Some of these complaints have been resolved with numpysane: https://github.com/dkogan/numpysane/ . With numpysane and gnuplotlib, I now find numpy acceptable and use it heavily for everything. But yeah; without these it's unusable.

alanbernstein · 49d ago

Thanks for the link, I have also grumbled about these issues, but never thought to look for a workaround library on top of numpy...

spott · 48d ago

Most of numpysane seems to be doing a loop in Python.

It isn’t true vectorization.

a_t48 · 49d ago

My issue with it is how easy it is to allocate all over the place if you forget to use inplace operations. It's even worse with cupy - rather than applying a series of operations to some data to produce some other data, you end up producing a set of data for each operation. Yes, there are workarounds, but they aren't as ergonomic (cupy.fuse() almost does the right thing, cleanly, but is a step you have to remember to use, and doesn't really work for anything that requires multiple shapes of array).

munchler · 49d ago

I have to agree with this as someone coming from a strongly-typed background (F#). PyTorch and NumPy are very flexible and powerful, but their API’s are insanely underspecified because every function seems to take multiple combinations of vaguely-typed objects. The library just figures out the right thing to do at runtime using broadcasting or other magic.

This kind of “clever” API seems to be both a benefit and curse of the Python ecosystem in general. It makes getting started very easy, but also makes mastery maddeningly difficult.

kccqzy · 49d ago

Broadcasting is a useful and powerful feature. It's precisely specified and easily learned. However, very few type systems in real world languages are powerful enough to express the concept of broadcasting in the type system.

coolcase · 49d ago

Broadcasting is good, but would be nice if it were explicit.

Maybe it being explicit gets in the way and become inelegant once you get some experience.

cycomanic · 49d ago

I sort of agree with the author that N>3 dimensional arrays are cumbersome in numpy, that said I think this is partly because we are really not that great thinking in higher dimensions. I'm interested what the authors solution to the problem is, but unlike the author I'm not a big fan of the eigen notation, but maybe I just don't use it often enough. I don't see the issue with a[:,:,None] notation and that's never given me issues, however I agree about the issue with index arrays. I often make something which think should work and then need to go back to the documentation to realise no that's not how it works.

The inconsistency for argument naming is also annoying (even more if we include scipy), e.g. why is it np.fft.fft(x, axis=1) but np.fft.fftshift(x, axes=1)?!

bear8642 · 49d ago

> we are really not that great thinking in higher dimensions.

Normally when you've hit 4d arrays, are you not dealing with a grid of grids?

That is say you've got a 2×3×4×5 array, another way to think about that is you've got a 2×3 matrix of 4×5 matrices. (APL example to show what I mean: https://tryapl.org/?clear&q=%7B%E2%8D%B5%2C%E2%8D%A5%E2%8A%8...)

cycomanic · 48d ago

I agree in principle, but if you try to do operations about submatrices (and in a way that's what the article is about) it gets tricky quite quickly. I actually often find it even harder to understand if there are multiple nested loops, but that might also be because I use numpy much more often.

thaumasiotes · 48d ago

> That is say you've got a 2×3×4×5 array, another way to think about that is you've got a 2×3 matrix of 4×5 matrices.

No, a grid of grids is the same thing as a grid.

You can lay out some 4×5 matrices within a 2×3 pattern to have something to look at, but the result is significantly less structured than the 4-dimensional space that the array represents.

    ***** ***** *****
    *A*** ***** *****
    ***** ***** *****
    ***** ***** *****
    
    ***** ***** *****
    ***** ***** *****
    ***** ***** *****
    ***** **B** *****

How close are the points A and B? The taxicab distance is 5.

drhagen · 49d ago

> Personally, I think np.einsum is one of the rare parts of NumPy that’s actually good.

einsum only being able to do multiplication makes it quite limited. If we leaned into the Einstein notation (e.g. [1]), we could make something that is both quite nice and quite performant.

[1] https://tensora.drhagen.com/

lvl155 · 49d ago

I am not a big fan of Python data libraries. They’re not cohesive in style across the board. Probably why I found R to be a better “classroom” solution. Julia is nice and so is Mathematica for purely math (and hat tip to Maple).

bee_rider · 49d ago

Numpy is mostly just an interface to BLAS/Lapack, but for Python, right? BLAS/Lapack aren’t clever libraries for doing a ton of small operations, they are simple libraries for doing the easy thing (operations on big dense matrices) about as well as the hardware can.

Numpy is what it is. Seems more disappointing that he had trouble with the more flexible libraries like Jax.

Anyway there’s a clear split between the sort of functionality that Numpy specializes in and the sort that Jax does, and they don’t benefit much from stepping on each other’s toes, right?

sundarurfriend · 49d ago

> D = 1/(LM) np.einsum('klm,ln,km->kn', A, B, C)

The first time I came across Einsums was via the Tullio.jl package, and it seemed like magic to me. I believe the equivalent of this would be:

     @tullio D[k, n] = 1/(L*M) * A[k, l, m] * B[l, n] * C[k, m]

which is really close to the mathematical notation.

To my understanding, template strings from PEP 750 will allow for something like:

    D = 1/(L*M) * np.einsum(t'{A}[k,l,m] * {B}[l,n] * {C}[k,m]')

right? If so, that'd be pretty neat to have.

CreRecombinase · 49d ago

It’s kind of wild how much work really smart people will do to get python to act like Fortran. This is why R is such a great language IMO. Get your data read and arrays in order in dynamic, scheme-like language, then just switch to Fortran and write actual Fortran like an adult.

t-kalinowski · 49d ago

Or, don't even write the fortran manually, just transpile the R function to fortran: https://github.com/t-kalinowski/quickr

_Wintermute · 49d ago

R kinda sucks at anything that isn't a dataframe though.

okanat · 49d ago

R sucks for any real programming tasks that you don't hardcode every single thing. It sucks at loading modules from your own project. It sucks to find the path of the script you're executing.

Bacially everything is stateful in R. You call standard library functions to install third party libraries ffs. And that operation can invoke your C compiler.

Putting R and a repo in a docker container to run it in a pipeline where nothing is hardcoded (unlike our datascience guy's workspace) was the worst nightmare we had to deal with.

dismalaf · 49d ago

This. Writing Fortran is easy as hell nowadays.

But yeah, I learned Fortran to use with R lol. And it is nice. Such easy interop.

harpiaharpyja · 49d ago

I don't use numpy much, but based on the documentation for that function and the fact that you can index by None to add another dimension, it seems like the correct call is:

  y = linalg.solve(A,x[:,:,None])

The documentation given says that A should be (..., M, M), and x should be (..., M, K). So if A is 100x5x5 and x is 100x5, then all you need to do is convert x to 100x5x1.

Is that right? It doesn't seem that bad.

geysersam · 49d ago

You're right but if you want `y` to have the same shape as `x` I think you also need to slice it after

    y = linalg.solve(A, x[..., None])[..., 0]

I don't really mind numpy syntax much, it gets the job done in most scenarios. Numba complements it really well when the code is easier to express like a couple of nested loops.

I used Julia for a while, but found it easier to predict the performance of python+numpy+numba, Julia has a few footguns and the python ecosystem is just insanely polished.

huqedato · 49d ago

Try Julia's built-in arrays or LinearAlgebra standard library handles matrix operations. You'll never go back to python (my case).

DrFalkyn · 49d ago

Back when our lab transitioned from Matlab to Python I used numpy/scipy quite a bit. I remember heavily leaning on numpy.reshape to get things to work correctly. In some cases I did resort to looping.

zahlman · 49d ago

TFA keeps repeating "you can't use loops", but aren't they, like, merely less performant? I understand that there are going to be people out there doing complex algorithms (perhaps part of an ML system) where that performance is crucial and you might as well not be using NumPy in the first place if you skip any opportunities to do things in The Clever NumPy Way. But say I'm just, like, processing a video frame by frame, by using TCNW on each frame and iterating over the time dimension; surely that won't matter?

Also: TIL you can apparently use multi-dimensional NumPy arrays as NumPy array indexers, and they don't just collapse into 1-dimensional iterables. I expected `A[:,i,j,:]` not to work, or to be the same as if `j` were just `(0, 1)`. But instead, it apparently causes transposition with the previous dimension... ?

jerf · 49d ago

You can draw out a sort of performance hierachy, from fastest to slowest:

    * Optimized GPU code
    * CPU vectorized code
    * Static CPU unvectorized code
    * Dynamic CPU code

where the last one refers to the fact that a language like Python, in order to add two numbers together in its native, pure-Python mode, does a lot of boxing, unboxing, resolving of class types and checking for overrides, etc.

Each of those is at least an order of magnitude slower than the next one up the hierarchy, and most of them appreciably more than one. You're probably closer to think of them as more like 1.5 orders of magnitude as a sort of back-of-the-envelope understanding.

Using NumPy incorrectly can accidentally take you from the top one, all the way to the bottom one, in one fell swoop. That can be a big deal, real quick. Or real slow, as the case may be.

In more complicated scenarios, it matters how much computation is going how far down that hierarchy. If by "processing a video frame by frame" you mean something like "I wrote a for loop on the frames but all the math is still in NumPy", you've taken "iterating on frames" from the top to the bottom, but who cares, Python can iterate on even a million things plenty quickly, especially with everything else that is going on. If, by constrast, you mean that at some point you're iterating over each pixel in pure Python, you just fell all the way down that hierarchy for each pixel and you're in bigger trouble.

In my opinionated opinion, the trouble isn't so much that it's possible to fall down that stack. That is arguably a feature, after all; surely we should have the capability of doing that sort of thing if we want. The problem is how easy it is to do without realizing it, just by using Python in what looks like perfectly sensible ways. If you aren't a systems engineer it can be hard to tell you've fallen, and even if you are honestly the docs don't make it particularly easy to figure out.

coolcase · 49d ago

Plus it isn't a checkbox on a UI where Electon being 1000 times slower (1ms instead of 1micro) would be noticeable.

It could be a 12 hour run vs. 12000000 hour run.

HdS84 · 49d ago

As a simple example : once upon a time a We needed to generate a sort of heat map. Doing it in pure python takes a few seconds at the desired size (few thousand cells where each cell needs a small formula). Dropping to numpy braucht that downs to hundreds of milliseconds. Pushing it to pure c got us to tens of milliseconds.

DrFalkyn · 49d ago

Yeah one of other beauties of numpy is you can pass data to/from native shared libraries compiled from C code with little overhead. This was more klidgy in Matlab last I checked

maximilianroos · 49d ago

that's a great hierarchy!

though what does "static cpu" vs "dynamic cpu" mean? it's one thing to be pointer chasing and missing the cache like OCaml can, it's another to be running a full interpreter loop to add two numbers like python does

jerf · 48d ago

That's what it means, basically. I draw a distinction between static code like C++ or Rust may generate and code like what Python may generate.

There is a middle ground of languages that box everything, but lack the rich complexity of Python or Ruby, such as Erlang, and I believe, O'Caml if you aren't semi-carefully programming for performance, that fits fairly cleanly into the middle ground between them. However compared to the uptake of static languages on the one side and the full dynamic scripting languages on the other, these are relatively speaking less common and don't get their own separate "in between" tier in my head, as it would end up being a .75-ish tier that would break the pattern. That is not to say they are bad or uninteresting, there's plenty of interesting languages there, they just aren't as popular.

yongjik · 49d ago

"merely less performant" is severely underselling it. It could easily add a few zeros to your execution time.

(And that's before you even consider GPUs.)

KeplerBoy · 49d ago

It's a slippery slope. Sometimes a python loop outside some numpy logic is fine but it's insane how much perf you can leave on the table if you overdo it.

It's not just python adding interpretor overhead, you also risk creating a lot of temporary arrays i.e. costly mallocs and memcopies.

nyeah · 49d ago

Right, you can use loops. But then it goes much slower than a GPU permits.

cycomanic · 49d ago

But once you need to use the GPU you need to go to another framework anyway (e.g. jax, tensorflow, arrayfire, numba...). AFAIK many of those can parallise loops using their jit functionality (in fact, e.g. numbas jit for a long time could not deal with numpy broadcasing, so you had to write out your loops). So you're not really running into a problem?

zahlman · 49d ago

My point is that plenty of people use NumPy for reasons that have nothing to do with a GPU.

crazygringo · 49d ago

The whole point of NumPy is to make things much, much faster than interpreted Python, whether you're GPU-accelerated or not.

Even code you write now, you may need to GPU accelerate later, as your simulations grow.

Falling back on loops is against the entire reason of using NumPy in the first place.

nyeah · 49d ago

I really disagree. That's not the only point of NumPy. A lot of people use it like Matlab, to answer questions with minimal coding time, not minimal runtime.

crazygringo · 49d ago

I mean sure, the fact that it is performant means tons of functionality is built on it that is hard to find elsewhere.

But the point is still that the main purpose in building it was to be performant. To be accelerated. Even if that's not why you're personally using it.

I mean, I use my M4 Mac's Spotlight to do simple arithmetic. That's not the main point in building the M4 chip though.

nyeah · 49d ago

As best I can see, you weren't originally talking about the reason for creating NumPy. Instead you seemed to be talking about the reason for using NumPy.

zahlman · 49d ago

I used it once purely because I figured out that "turn a sequence of per-tile bitmap data into tiles, then produce the image corresponding to those tiles in order" is equivalent to swapping the two inner dimensions of a four-dimensional array. (And, therefore, so is the opposite operation.) The task was extremely non-performance-critical.

Of course I wasn't happy about bringing in a massive dependency just to simplify a few lines of code. Hopefully one day I'll have a slimmer alternative, perhaps one that isn't particularly concerned with optimization.

nyeah · 49d ago

I mean yes. Also in your example where you hardly spend any time running Python code, the performance difference likely wouldn't matter.

albertzeyer · 48d ago

It seems the main complaint is about confusing shapes / dimensions. Xarray has already be mentioned, but this is a broader concept, called named dimensions, sometimes also named tensors, named axes, labeled tensors, etc, which has often been proposed before, and many implementations exists.

https://nlp.seas.harvard.edu/NamedTensor

https://namedtensor.github.io/

https://docs.pytorch.org/docs/stable/named_tensor.html

https://github.com/JuliaArrays/AxisArrays.jl

https://github.com/ofnote/tsalib

https://github.com/google-deepmind/tensor_annotations

https://github.com/google-deepmind/penzai

https://github.com/tensorflow/mesh/

https://github.com/facebookresearch/torchdim

https://xarray.dev/

https://returnn.readthedocs.io/en/latest/getting_started/dat... (that's my own development; unfortunately the doc is very much outdated on that)

teleforce · 48d ago

Someone just need to write a new named multi-dimentional array processing library in Dlang with Mir since it's more intuitive and faster than venerable Fortran library that's being widely used by array languages including Matlab, Julia, etc [1], based the implementation on the methodology presented in the seminal book by Hart [2], problem solved.

[1] Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen (2016):

http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

[2] Multidimensional Analysis: Algebras and Systems for Science and Engineering (1993):

https://link.springer.com/book/10.1007/978-1-4612-4208-6

hatthew · 49d ago

Am I the only one who feels like this is a rant where the author is projecting their unfamiliarity with numpy? Most of the examples seem like straw men, and I can't tell if that's because they are or if the author genuinely thinks that these are all significant problems.

For example, in the linalg.solve problem, based on reading the documentation for <60 seconds I had two guesses for what would work, and from experimenting it looks like both work equally. If you don't want to take the time to understand the documentation or experiment to see what works, then just write the for loop.

For indexing problems, how about just don't use weird indexing techniques if you don't understand them? I have literally never needed to use a list of lists to index an array in my years of using numpy. And if you do need to use it for some reason, spend 1 minute to test out what happens. "What shape does B have?" is a question that can be answered with `print(B.shape)`. If the concern is about reading code written by other people, then context should make it clear what's happening, and if it's not clear then that's the fault of the person who wrote the code for not using sensible variable names/comments.

kccqzy · 49d ago

I agree this is a rant. I don't think the author is projecting their unfamiliarity here, but they are complaining that numpy is too difficult to learn, especially that style where you avoid all Python loops.

hatthew · 48d ago

I think my comment on projection was prompted by "which of these is right? No one knows." and "Read that. Meditate on it. Now, notice: You still don’t know how to solve..." when the documentation seems relatively clear (at least as clear as something can be when it's handling 3+ dimensional arrays)

coderenegade · 48d ago

Phrasing the indexing problem as a "list of lists of indices" actually makes the behavior pretty intuitive. It's probably an ill-specified goal, but if you wanted to do it, that's probably what it should look like.

rrr_oh_man · 49d ago

Coming from the excellent tidyverse or even data.table in R, Numpy always has felt like twenty steps back into mindfuck territory.

jampekka · 49d ago

Numpy and tidyverse/data.table are not really on the same level of abstraction. Something like Pandas would be (and it definitely has its warts).

Doing the lower level stuff that NumPy is used for is really mindfuck territory in R.

acc_297 · 49d ago

There's also the fantastic "tidytable" library. I wouldn't want to implement multi-headed self attention in either of those libraries though.

I've done only very simple ML stuff using R and it never feels like exactly the right tool.

frakt0x90 · 49d ago

I had to write a normal codebase in R exactly one time and it was one of the weirdest and most unpleasant coding experiences I've had. After that, I decided R is tidyverse and a handful of stats libraries. If you need more, reach for another tool.

lottin · 49d ago

I never understood the appeal of tidyverse. I have a colleague who, like you, thinks that R is tidyverse. He also has the nasty habit of starting Excel formulas with a '+' sign.

jampekka · 49d ago

The main attraction of tidyverse is that it's easy to copy-paste code for common cases. If there's no ready recipe for something, it's usually just not done, or some "technical person" is called to do it.

R is used mostly as a fancy calculator, which is fine in itself, but it makes the comparisons to general purpose languages like Python quite apples-to-oranges.

Imnimo · 49d ago

The trouble is that I can never tell when the cause of my issues is "numpy's syntax/documentation is bad" and when it's "I am very bad at thinking about broadcasting and shape arithmetic".

WD-42 · 49d ago

The answer to all these complaints is simple: use APL. Or rather these days, BQN.

Qem · 48d ago

There is at least one project that managed to take the Iverson ghost[1] from Numpy and actually implement a complete array language[2] on top of it, so there is no need to stray far away from Python's ecosystem.

Previous discussion: https://news.ycombinator.com/item?id=42293723

[1] https://analyzethedatanotthedrivel.org/2018/03/31/numpy-anot...

[2] https://github.com/briangu/klongpy

poulpy123 · 48d ago

LMAO, yes let's replace a language with some issue with something totally unreadable

actinium226 · 48d ago

For the first example:

    import numpy as np
    A = np.random.random((100, 5, 5))
    b = np.random.random((100, 5, 1))
    x = np.linalg.solve(A, b)

Admittedly the documentation is a little hard to read if you just look at 'type' for b, but then it says "Returned shape is (..., M) if b is shape (M,) and (..., M, K) if b is (..., M, K) where ... is broadcasted between a and b"

So you have to make sure your b vector is actually a matrix and not a vector (if K=1 for your case).

theLiminator · 49d ago

Anyone use xarray? Curious how it compares?

xg15 · 49d ago

Looks great, but doesn't support pytorch yet, only numpy, or does it?

https://github.com/pydata/xarray/issues/3232

apetrov · 49d ago

my experience was positive in context of pymc and arviz (as a way to represent posterior and posterior predictive distributions). API is sane and quite easy to get around.

zdevito · 49d ago

I tried to do something similar with 'first-class' dimension objects in PyTorch https://github.com/pytorch/pytorch/blob/main/functorch/dim/R... . For instance multi-head attention looks like:

    from torchdim import softmax
    def multiheadattention(q, k, v, num_attention_heads, dropout_prob, use_positional_embedding):
        batch, query_sequence, key_sequence, heads, features = dims(5)
        heads.size = num_attention_heads
    
        # binding dimensions, and unflattening the heads from the feature dimension
        q = q[batch, query_sequence, [heads, features]]
        k = k[batch, key_sequence, [heads, features]]
        v = v[batch, key_sequence, [heads, features]]
    
        # einsum-style operators to calculate scores,
        attention_scores = (q*k).sum(features) * (features.size ** -0.5)
    
        # use first-class dim to specify dimension for softmax
        attention_probs = softmax(attention_scores, dim=key_sequence)
    
        # dropout work pointwise, following Rule #1
        attention_probs = torch.nn.functional.dropout(attention_probs, p=dropout_prob)
    
        # another matrix product
        context_layer = (attention_probs*v).sum(key_sequence)
    
        # flatten heads back into features
        return context_layer.order(batch, query_sequence, [heads, features])

However, my impression trying to get a wider userbase is that while numpy-style APIs maybe are not as good as some better array language, they might not be the bottleneck for getting things done in PyTorch. However, other domains might suffer more, and I am very excited to see a better array language catch on.

janalsncm · 49d ago

I would like to be able to turn off implicit broadcasting entirely. It has caused me so many evil bugs (PyTorch, but the same thing applies to Numpy).

Imagine all of the weirdness of js “truthiness” bugs (with silent unexpected behavior) combined with the inherent difficulty of debugging matrix multiplications in 2, 3, or 4 dimensions. I would rather herd goats in a Lumon basement.

Jax can do it but I’m not willing to migrate everything.

semiinfinitely · 49d ago

all issues raised are addressed by jax.vmap

patrickkidger · 49d ago

Went scrolling looking for this! Most of the article is about problems solved in JAX.

Also worth noting the Array API standard exists now. This is generally also trying to straighten out the sharp edges.

amensch · 49d ago

Same here, beautiful solution

FrameworkFred · 49d ago

I'm actually super-interested to see the next post.

TBH, if you would've asked me yesterday if I'm the sort of person who might get sucked in by a cliffhanger story about a numpy replacement, I'm pretty sure I would've been an emphatic no.

But I have, in fact, just tried random things in numpy until something worked...so, you know...tell me more...

__mharrison__ · 49d ago

Multiple dimensions (more than 2) are hard.

I was at a conference the other day, and my friend (a very smart professor) was asking me if it would be possible to move away from Xarray to Pandas or Polars...

Perhaps using Numba or Cython (with loops) might make it fast but less prone to confusion.

Luckily for me, I mostly stay in tabular data (< 3 dimensions).

fscknumpy · 49d ago

I had to make a HN account just to show this numpy "feature".

Create an array like this np.zeros((10, 20, 30)), with shape (10, 20, 30).

Clearly if you ask for array[0][:, [0,1]].shape you will get an output of shape (20, 2), because you first remove the first dimension, and then : broadcasts across the second dimension, and you get the first two elements of the third dimension.

Also, obviously we know that in numpy array[a][b][c] == array[a,b,c]. Therefore, what do you think array[0, :, [0,1]].shape returns? (20, 2)? Nope. (2, 20).

Why? Nobody knows.

thaumasiotes · 48d ago

Well, based on the document complained about in the post...

> The broadcasting mechanism permits index arrays to be combined with scalars for other indices. The effect is that the scalar value is used for all the corresponding values of the index arrays

Because you're using advanced indexing in the second example, 0 is an array of shape 2, [0, 0], not a scalar value.

So the first thing that happens is that you select two coordinates in `array`, those being (1) where i=0 and k=0, and (2) where i=0 and k=1.

Then,

> In general, the shape of the resultant array will be the concatenation of the shape of the index array (or the shape that all the index arrays were broadcast to) with the shape of any unused dimensions (those not indexed) in the array being indexed.

You didn't use the j dimension. The value at each of the two coordinates you selected is a size 20 array, so your result is an array with two rows, those being the two arrays you pointed to.

----

Your example appears to be identical to indexing example D from the article. Example C looks harder to explain.

QuadmasterXLII · 49d ago

Numpy and torch have opposite conventions as to which end to broadcast from if arrays hve different numbers of dimensions, and as a result I completely forgot both. At this point I have to always None both arrays to the same dimension, which is ugly but actually fairly explicit

ryandrake · 48d ago

Since this has turned into a session of “my problem with numpy is…” then I’ll add: import time. I have some short scripts whose business logic takes no time, so importing a dependency that imports numpy turns out to take a significant amount of the script’s runtime.

frollogaston · 49d ago

About optimization, I feel like NumPy is meant to be a de facto standard and reference implementation. It covers all use cases with decent efficiency, not the fastest way possible. There are more limited drop-in replacements that use more CPU parallelism or GPU if NumPy isn't fast enough for your use case. Just wish it were clearer which NumPy build I'm installing, cause apparently `pip3 install numpy` on my Mac gave me something built with the worst flags possible.

About >2 dimensions, I always found this confusing in NumPy but just chalked it up to >2 dim arrays being inherently confusing. Maybe there really is a better way.

ris · 49d ago

I would much much prefer the cited "awful" syntax over the proposed loop syntax any day. Don't make me run a little virtual machine in my head to figure out what the end result of a block of code is going to be.

constantcrying · 48d ago

One of the biggest hurdles of numpy is python, which is very clearly not a language designed for numerical analysis.

Comparing it to Julia and Matlab you can see many, many cases where the python design just does not fit.

odyssey7 · 47d ago

Python in general and its ecosystem are some of the biggest roadblocks to building real-world AI systems, in my opinion.

Python is a scripting language that makes it easy to produce student-quality work. Academia adopted it for its pedagogical utility.

Unfortunately, people made the assumption that, because universities teach it, it's well-suited for production systems.

I'd argue that Python doesn't scale very well for codebases much larger than a homework assignment.

ebonnafoux · 48d ago

What I would really like is to type check the shape of a numpy array to prevent dimension array at runtime. You could use third party library like https://github.com/ramonhagenaars/nptyping or https://github.com/beartype/beartype#numpy-arrays but it will not extend to the methode of Numpy.

flhammer · 48d ago

Maybe you are looking for something like this? https://news.ycombinator.com/item?id=40022628

But not sure how far this gets you with regards to the pain points mentioned in this post.

bee_rider · 49d ago

A dumb thought: technically scipy has a “solve_banded” function that does a banded solve. He could easily recast his problem into a single big diagonal problem, I guess, just with some silly explicit zeros added. I wonder how the performance of that would compare, to iterating over a bunch of tiny solves.

Of course, it would be nice if scipy had a block diagonal solver (maybe it does?). But yea, I mean, of course it would be nice if my problem was always built in functionality of the library, lol.

Maybe a bsr_matrix could work.

wedesoft · 48d ago

Implementing array operations basically requires macros and JIT compilation. I have implemented JIT compilation of tensor operations in Scheme using LLVM. Maybe I should redo it in Clojure, but I think, most programmers don't care enough, to leave the Python ecosystem. https://wedesoft.github.io/aiscm/operation.html

jamesblonde · 49d ago

In Data for ML, everything has switch from NumPy (Pandas) to Arrow (Polars, DuckDB, Spark, Pandas 2.x, etc). However, Scikit-Learn is still a hold out, so it's Arrow from you data sources all to way to pre-processing pipelines in Scikit-Learn when you have to go back to NumPy. In practice, it now makes more sense to separate feature pipelines in Arrow from training pipelines with Pandas/NumPy and Scikit-Learn.*

*This is ML, not Deep Learning or Transformers.

kccqzy · 48d ago

Most Arrow arrays can be transformed into numpy arrays in a zero-copy manner. And having used both, I personally think Arrow is way more buggy than numpy: PyArrow segfaults for me about once a month when writing pure Python; numpy never segfaulted on me.

phronimos · 49d ago

Numba is a great option for speeding up (vectorizing) loops and NumPy code, apart from CuPy and JAX. Xarray is also worth trying for tensors beyond 2 dimensions.

KeplerBoy · 49d ago

true, a nice jit compiler solves a lot of the problems mentioned in the article. These days i often use jax.jit for the gpu support and numpy like syntax with the added benefit of fast loop constructs.

complex_pi · 49d ago

NumPy allows a lot of science to happen. Grievance is fine but a little respect is as well.

NumPy is the lingua franca for storing and passing arrays in memory in Python.

Thank you NumPy!

rekenaut · 49d ago

This is great aspect of it, but that doesn’t diminish that it feels so tedious to work with compared to Julia and Matlab. Some of that is just from trying to shoehorn Python as a scientific computing language, but I think it’s time to revisit whether Python should have first party support for vectorization, arrays in memory, and broadcasting as Julia and Matlab have.

complex_pi · 49d ago

NumPy allowed a PEP for the in-memory representation of arrays in Python. This is tremendous useful for making APIs.

jampekka · 49d ago

I've never understood the "so tedious" argument of Python vs Matlab. Sure, having to import numpy and use np.array to create numpy arrays is a bit typing, but other than that I don't see major differences.

Given what inconsistent mess Matlab is aside from matrix manipulation, I take NumPy any day.

sevensor · 48d ago

Where are these people who use Matlab properly? All I ever see in other people’s Matlab code is for loops. Their idea of speeding up their code is using parfor.

Obligatory licensing rant: you have to pay extra to use parfor, as I found out one day when the license server at work told me I couldn’t run somebody else’s code because we were out of licenses for that iteration construct. This is just Mathworks taking advantage of ignorant people who don’t understand what Matlab is to inflict more misery with their insane licensing scheme. It’s not just that it’s expensive, it’s that it’s weirdly unpredictable. I mean, come on, the code was slow and chock full of errors as it was. To have it randomly fail because the license server was out of parallel toolbox licenses is just insulting.

tptacek · 49d ago

I get the impression the author is pretty familiar with what NumPy is.

jampekka · 49d ago

And given the ending of TFA the author would likely agree with GP.

nayuki · 49d ago

I skimmed the article and agree with it at a high level, though I haven't faced most of those problems personally.

I have several gripes with NumPy, or more broadly the notion of using Python to call a C/asm library that vectorizes math operations. A lot of people speak of NumPy like the solution to all your high-performance math needs in Python, which I think is disingenuous. The more custom logic you do, the less suitable the tools get. Pure Python numeric code is incredibly slow - like 1/30× compared to Java - and as you find parts that can't be vectorized, you have to drop back down to pure Python.

I would like to give the simple example of the sieve of Eratosthenes:

  def sieve_primeness(limit):
      result = [False] + [True] * limit
      for i in range(2, len(result)):
          if result[i]:
              for j in range(i * i, len(result), i):
                  result[j] = False

This code is scalar, and porting it to a language like C/C++/Rust/Java gives decent performance straight out of the box. Performance in Python is about 1/30× the speed, which is not acceptable.

People pretend that you can hand-wave the performance problem away by using NumPy. Please tell me how to vectorize this Python code. Go on, I'm waiting.

You can't vectorize the `if result[i]` because that controls whether the inner for-loop executes; so it must execute in pure Python. For the inner loop, you can vectorize that by creating a huge temporary array and then AND it with the result array, but that is extremely memory-inefficient compared to flipping bits of the result array in place, and probably messes up the cache too.

Alternatively, you can run the code in PyPy, but that usually gives a speed-up of maybe 3×, which is nowhere near enough to recover the 30× speed penalty.

Another example is that NumPy is not going to help you implement your own bigint library, because that also requires a lot of custom logic that executes between loop iterations. Meanwhile, I've implemented bigint in pure Java with acceptable performance and without issues.

Again, my point is that anytime you venture into custom numerical/logical code that is not the simple `C = A + B`, you enter a world of pain when it comes to Python performance or vectorization.

nonameiguess · 49d ago

Ironically enough, I wrote a Sieve in NumPy 12 years ago or so in grad school messing around with Project Euler. I haven't touched it since and have not regularly used either NumPy or even Python in quite some time, but I still have the solutions in a private Github repo. The code is:

  def primes(n):
      """Input: A natural number n.
         Output: An array of primes, p < n."""
      assert n >= 2
      sieve = np.ones(n / 2, dtype=np.bool)
      for i in range(3, int(n ** 0.5) + 1, 2):
          if sieve[i / 2]:
              sieve[i * i / 2::i] = False
      return np.r_[2, 2 * np.nonzero(sieve)[0][1::] + 1]

It still relies on looping and I have no idea if you can fully vectorize everything, but it does at least get quite a bit of speedup thanks to the ability to index one array using another, so I can avoid the inner loop you have to use in pure Python.

I have absolutely no clue what that final line is doing and I'm not sure I understood it 12 years ago, either.

gjm11 · 49d ago

sieve[i] is nonzero precisely when 2i+1 is an odd prime or i=0 (because it's initialized to all-1 and nothing ever zeros the first element).

So np.nonzero(sieve) is all the indices for which i=0 or 2i+1 is an odd prime.

Except that the behaviour of np.nonzero is a bit weird: it returns a tuple of arrays, one for each axis. (So, e.g., if you feed it a 2d array it gives you (a,b) such that the nonzero elements are at positions (a[0],b[0]), (a[1],b[1]), etc.) That's kinda reasonable behaviour, but it means that for the simple case of a 1-d array you get a 1-element tuple.

So np.nonzero(sieve)[0] is actually all the indices i for which i=0 or 2i+1 is an odd prime, and now you know why I didn't write "all the indices i for which ..." above.

The case i=0 would correspond to 1 being prime, which it isn't considered to be. So we throw that one out.

So np.nonzero(sieve)[0][1::] is all the indices i for which 2i+1 is an odd prime.

And so 2*np.nonzero(sieve)[0][1::]+1 is all the 2i+1 for which 2i+1 is an odd prime; that is, all the odd primes.

I don't think I've encountered np.r_ before, but it's pretty obvious what it's doing: prepending 2 to that array, so now it's "2 followed by all the odd primes" or, more simply, all the primes.

(Obviously, in the above phrases like "all the primes" mean "all the primes up to the limit defined by the size of the array".)

Jtsummers · 49d ago

I'm not a numpy expert, I just use it on occasion but this works and eliminates the explicit inner loop, but not the outer loop. It collects the list of prime numbers while removing multiples of the prime from the numpy array of numbers:

import numpy as np

  def sieve_primes(limit):
    nums = np.arange(limit + 1)
    nums[1] = 0
    primes = []
    for i in range(2, limit):
      if nums[i] != 0: # could just be `if nums[i]:` since 0 is false-y
        primes.append(i)
        nums = nums * (np.mod(nums, i) != 0)
    print(primes)
  sieve_primes(1000)

This takes advantage of the fact that True and False are treated as 1 and 0 in Python for the multiplication.

EDIT: The other solutions people have shown are closer to the sieve itself than my solution. I was having trouble with the slicing notation, now that I've seen how that should be done I'd redo the above as:

  def sieve_primes(limit):
    nums = np.arange(limit + 1)
    nums[1] = 0
    for i in range(2, limit):
      if nums[i] != 0:
        nums[i+i::i] = 0
    print(np.nonzero(nums))

However, the first version is faster by my testing so I'd end up just going back to it if I really cared about performance.

WCSTombs · 49d ago

> Please tell me how to vectorize this Python code. Go on, I'm waiting.

Here's a start:

    import numpy as np

    def sieve(limit):
        result = np.ones(limit + 2, dtype=bool)
        result[0] = False
        result[1] = False
        for i in range(2, limit + 1):
            if result[i]:
                yield i
                result[::i] = False

    for prime in sieve(100):
        print(prime)

As you said, you can't vectorize the outer loop because each iteration depends on the result of previous iterations, so I think you can't do too much better than that with this algorithm. (There are a few easy algorithm optimizations, but I think that's kind of orthogonal to your point, and anyway you can do them in any language.)

I would expect there's still a significant performance gap between that and, say, a simple C implementation, but that shouldn't be surprising, and personally I haven't encountered these supposed droves of people claiming that NumPy fully solves the performance gap. From what I've seen there was a general consensus that vectorizing with NumPy can significantly tighten the gap but can rarely fully close it.

thaumasiotes · 47d ago

> As you said, you can't vectorize the outer loop because each iteration depends on the result of previous iterations

I don't see why not. That check for whether result[i] is prime is just a performance optimization. If you want to do a different performance optimization, you can do that without hurting anything.

Note that while it's true that multiples of prime numbers are composite, it's also true that multiples of composite numbers are composite.

brosco · 49d ago

Have you heard of JIT libraries like numba (https://github.com/numba/numba)? It doesn't work for all python code, but can be helpful for the type of function you gave as an example. There's no need to rewrite anything, just add a decorator to the function. I don't really know how performance compares to C, for example.

xg15 · 49d ago

> Please tell me how to vectorize this Python code. Go on, I'm waiting.

You can't entirely get rid of the loops, but at least the inner one can be vectorized, I think. That would reduce the "pythonic runtime" from squared to linear.

Maybe something like this?

  result = np.full((limit+1,), True, dtype=bool)
  result[0] = False
  indices = np.arange(len(result))
  for i in range(2, len(result)):
    result[indices[i:] * i] = False

This is just out of my head, so may contain bugs. Also, the indices array may need to be clamped, not sure right now if numpy accepts out-of-bounds indices.

robert-brown · 48d ago

One big problem is that Python's semantics make it difficult to implement the language efficiently. When I translate the sieve code into safe Common Lisp, I get a speedup of 15. After adding a few declarations, making the code as unsafe as C, the speedup factor is 30. Those improvements are possible because the compiler can assume many things about the functions and operators used in the code. For instance, that the Lisp equivalents of range() and len() have not been redefined.

__mharrison__ · 49d ago

See Numba or Cython...

raydiak · 48d ago

I vaguely recall having a few similar complaints about Perl's PDL back in the day. Makes me wonder why we can't also support something closer to syntactically loop-esque constructs to express the same things without the need for special slicing/indexing notations with their own implicit semantics.

hodder · 48d ago

When I worked as a quant in trading risk, we used MATLAB and I always questioned the reasoning behind it... thinking it was mostly just historical plus the slick IDE, but it also makes this kind of thing dead simple. Vectorized code is just the defaullt.

Pompidou · 48d ago

Array programming requires true array languages like apl and j.

nuc1e0n · 48d ago

I think Numpy is representative of the Python ecosystem as a whole. Powerful, but internally complex, bloated, poorly designed and poorly documented.

aanet · 48d ago

I feel heard... I'm not the only one finding numpy confusing, esp with more than 3 dimensions.

the_clarence · 48d ago

Why do people use numpy instead of sage?

eurekin · 49d ago

That's one area LLMs hugely helped. You can just ask it and ask to generate tests to verify

make3 · 47d ago

jax's vmap and jitting were attempts to improve things like what the author describes about solve. Pytorch had attempts to integrate this. Not sure what came out of it

coolcase · 49d ago

Sounds like hammers and nails problem. Your nail gun is called PyTorch.

josefrichter · 49d ago

You could try Elixir’s Nx and related ecosystem.

kccqzy · 49d ago

I have an unpopular opinion: I don't like numpy.einsum because it is too different from the rest of numpy. You label your axes with letters but none of the other regular numpy functions do that. I usually avoid using numpy.einsum in favor of using a combination of indexing notation with numpy.newaxis (None), broadcasting, and numpy.swapaxes.

And I learned from someone more senior than me that you should instead label your variables with single-letter axes names. This way, the reader reads regular non-einsum operations and they still have the axes information in their mental model. And when you use numpy.einsum these axes labeling information become duplicated.

srean · 49d ago

I for one like Numpy's broadcast far better than Matlab's. In Numpy it's a logical operation, no unnecessary memory/copying cost.

Last time I checked Matlab (that was surely decade ago) it actually filled memory with copied data.

credit_guy · 49d ago

Here's how you do it: focus on the simple case, solve it, then ask Copilot to vectorize the code. Life is too short.

gus_massa · 48d ago

I never tried that. Can you use this method with the example in the article and post it here? Does it work of Copilot hallucinates magical functions? (I'd be worry if I ever have to fix the code.)

credit_guy · 48d ago

"Here’s how you can create a test case to compare the performance of the for-loop method and the vectorized method using numpy.linalg.solve. I will provide the Python code for you to test this."

    import numpy as np
    import time

    # Create random test data
    np.random.seed(42)  # For reproducibility
    A = np.random.rand(100, 5, 5)  # 100 random 5x5 matrices
    x = np.random.rand(100, 5)     # 100 random vectors of length 5

    # Method 1: For-loop approach
    start_time = time.time()
    y_loop = np.empty_like(x)
    for i in range(100):
        y_loop[i, :] = np.linalg.solve(A[i, :, :], x[i, :])
    loop_time = time.time() - start_time

    # Method 2: Vectorized approach
    start_time = time.time()
    y_vectorized = np.linalg.solve(A, x)
    vectorized_time = time.time() - start_time

    # Verify correctness
    correctness = np.allclose(y_loop, y_vectorized)

    # Print results
    print(f"For-loop method time: {loop_time:.6f} seconds")
    print(f"Vectorized method time: {vectorized_time:.6f} seconds")
    print(f"Results are {'correct' if correctness else 'incorrect'}")

For-loop method time: 0.001004 seconds Vectorized method time: 0.000000 seconds Results are correct

adipandas · 48d ago

I completely agree with you mate.

sunrunner · 49d ago

"Young man, in numpy you don't understand things. You just get used to them." -- John von Neumann (probably)

xg15 · 49d ago

I'm pretty sure it was Einstein.

sunrunner · 49d ago

Darn, I thought it was Feynman and checked it beforehand. Maybe it was Von Neumann quoting Einstein (himself quoting...)

gjm11 · 49d ago

I think xg15 was just making a joke about the tendency for every quotation that could possibly be assigned to Einstein to be assigned to Einstein.

(Perhaps you too are making a joke and it's gone over my head, in which case my apologies.)

xg15 · 48d ago

Nah, was just making a general joke about fake quotes, as did the GP I think.

JJMcJ · 49d ago

It has a von Neumann sound to it.

the__alchemist · 49d ago

I'm with the author here. Great for simple use cases. Anything beyond that, and I find the syntax inscrutable. It's like using a terse, feature-rich DSL like Regex. Good luck deciphering someone else's numpy code (or your own from a month back) unless you really take the time to commit it to memory.

neonsunset · 49d ago

Sighs in F# and Julia

sundarurfriend · 49d ago

I understand the Julia part, but does F# have any particularly good facilities or libraries for this kind of (fast) array manipulation? It's been getting a good amount of love on HN lately, but I don't know if any specific strengths of it (beyond the general sense of "functional language with access to .NET ecosystem").

int_19h · 49d ago

https://github.com/fsprojects/awesome-fsharp/blob/main/READM...

sundarurfriend · 49d ago

I was more curious why they chose F# among the many many languages out there, and this kind of list unfortunately is not very useful for that - similar lists exist in pretty much every awesome-$language page out there. They don't give us any idea of how robust or featureful the ecosystem and support is, which are the things that would differentiate "this too supports it, kinda" (which is every major language out there) and "this is really well-supported and ergonomic".

betelgeuse6 · 48d ago

Something wrong with that font.

Ask HN: What are the best resources to help with health insurance denials?

Ask HN: How to create a more human-centric web?

Ask HN: I give in, what are the resources for picking up AI-assisted coding?

Ask HN: Freelancer? Seeking freelancer? (July 2025)

Ask HN: Who is hiring? (July 2025)

Super Simple "Hallucination Traps" to detect interview cheaters

Ask HN: How do I prevent execs from obsessing over copy-protection?

Ask HN: What Are You Working On? (June 2025)

Ask HN: Who wants to be hired? (July 2025)

Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?

1KB JavaScript Demoscene Challenge Just Launched

Ask HN: Should I use microservices or monolithic architecture?

Ask HN: Why there is no demand for my SaaS when competition is killing it?

Ask HN: Ideas to acquire "good taste" in programming?

Tell HN: Google says "not vuln", fixes hours later without attribution

Ask HN: Are AI Copilots Eroding Our Programming Skills?

Ask HN: How to Block Spam Mails?

Ask HN: Why privacy consent is NOT part of Browser setting?

Ask HN: How do I open up my side project to the world?

Ask HN: 80s electronics book club; anyone remember this illustrator?

Ask HN: Would limiting game size to 5–10 MB spur the creation of novel games?

Why Are SaaS Boilerplates Still This Expensive? So I Built My Own

Ask HN: How have you shared computers with your young child (~3 to 5)

Ask HN: How did low contrast text become so pervasive?

Ask HN: Startup shutting down, should we open source?

LinkedIn Locked Me Out Until I Submit to Biometric ID Verification via Persona

Ask HN: Anyone is an "AI Engineer"? What does your job tasks include?

Ask HN: Stock Android tablet free of bloatware?

Ask HN: Which Free Software or Open Source Project Needs Help?

Ask HN: Is noprocrast still working for you?

It is not possible to install your own addon in Firefox without Moz's approval

Tell HN: (dictionary|thesaurus).reference.com is now a spam site

Ask HN: Who's using AI to build non-AI products?

Harsh Working Environment in Japan

Ask HN: How to find developers interested in open-source concepts?

Border search safe TOTP authenticator app?

Ask HN: Better-auth or Nextauth or something else

Ask HN: Which AI Dev Assistant Are You Using and Why?

I don't like NumPy

Comments (206)