How very interesting. Only a few days ago I was reminiscing about scipy.weave, a horrendous hack and miracle of productivity in Python at the same time. It let users write inline C++, which would get compiled/cached into ephemeral extension modules. For certain jobs, and C++ users, beats numba, cython etc cleanly out of the water. It is sadly deprecated and long time not maintained. Is this a suitable replacement?
AFAICT this does not quite produce true binaries, but rather interprets C++ via Cling, is that right? And the docs only offer that C++-like speeds are achieved for PyPy. If there are any performance benchmarks for CPython3, I can't see find them. Thats the real question - few people combine Python and C++ just for the fun of it.
EDIT some benchmarks are available in this paper, linked from TFA: https://wlav.web.cern.ch/wlav/Cppyy_LavrijsenDutta_PyHPC16.p... But they don't really answer my question. The benchmarks seem to mostly look at the overhead of wrapping C++, rather than comparing to a Python implementation. There is some I/O involved in some of them, which is maybe not so interesting, and some of the benchmarks don't even have a pure CPython implementation. Where they do, speed is very close. But then the paper is from 2018, a lot may have changed.
iopapa · 4h ago
If not doing anything edge-casey, nanobind[0] is extremely pleasant to use. It’s a rewrite of pybind11 specifically designed for the 80/20 use-cases and solves the long compile times.
I have used it extensively over the last year in atopile[1] if someone is looking for a real-world production code example.
We are using nanobind paired with hatch & scikit-build.
I suggest having a look at the pyproject and src/faebryk/core/cpp.
I love nanobind, use it all the time and highly recommend it. It makes it very easy to pass numpy, PyTorch, cupy or tensorflow arrays to your C++ extension, to specify what array shape and flags are expected, and to wrap either C++ or Cuda code. When paired with scikit-build, it makes building Python packages with C++ extensions a breeze. I would give it more than one star on github if I could.
almostgotcaught · 3h ago
> solves the long compile times
this only goes so far - if you try to eg bind O(10k) methods using nanobind (or pybind ofc) you will be compiling for a very long time. for example, i have a threadripper and with a single large TU (translation unit) it took about 60 minutes (because single TU <-> single thread). i had to "shard" my nanobind source to get down to a "reasonable" ~10 minutes.
beng-nl · 3h ago
I’m a bit surprised (but interested) to read it beats cython (in performance I assume). Cython can - and for performance, should - be written so that loops and code in loops are pure C code without any Python interaction. Even the GIL can be released. Maybe I’m making too many assumptions about the two cases, but in what way do you see cython being beaten given the above?
Thanks!
rich_sasha · 3h ago
IME performant Cython is quite hard to write. Simply renaming your file to *.pyx speeds it up, very much finger in the air, by factor 2x on compute-heavy tasks.
Then you sprinkle some cdef around etc and you get a bit faster again. You rewrite your algo a bit, so it's more "stateful C" style, which is not so much the Python way, and it gets a little faster. But not that much.
So then to make real gains you have to go into the weeds of what is going on. Look at the Cython bottlenecks, usually the spots where Cython has to revert to interacting with the Python interpreter. You may go down the rabbit holes of Cython directives, switching off things like overflow checks etc. IME this is a lot of trial and error and isn't always intuitive. All of this is done in a language that, by this point, is superficially similar to Python but might as well not be. The slowness comes no longer from algorithmic logic or Python semantics but from places where Cython escapes out to the Python interpreter.
At this point, C++ may offer a respite, if you are familiar with the language. Because performance tradeoffs are very obvious in code right in front of you. You get no head start in terms of Pythonic syntax, but otherwise you are writing pure C++ and its so much easier to reason with the performance.
I would imagine that very well written Cython is close in performance to C++ but for someone who knows a bit of C++ and only occasionally writes Cython, the former is much easier to make fast.
boothby · 19m ago
I write performant cython all the time, as a glue language. Write your "business logic" in Python. Write your class definitions and heavyweight algorithms in C++. Write your API in Cython. If you're writing your business logic and heavyweight algorithms all in cython, you're in for some misery.
jononor · 4h ago
It is pretty easy and convenient to write extensions using pybind11, including passing numpy arrays. It takes 10 lines in setup.py and around 10 lines in a .cpp file, run setup.py build_ext to build it. Not quite the convenience of inline - but in practice pretty nice.
My only nit in that compile time is around 3 seconds on my machine.
almostgotcaught · 4h ago
the purpose of this tool isn't to "accelerate" python or whatever - it's to bind cpp.
> AFAICT this does not quite produce true binaries, but rather interprets C++ via Cling, is that right?
yes but to be very clear: it's not designed to interpret arbitrary cpp but calls and ctors and field accesses. the point is binding. also it can use cling or clang-repl.
ashvardanian · 4h ago
In case you are searching for fun use-cases, here's how one experiment with weird similarity metrics & kNN data-structures via Cppyy (for C++ kernel), Numba (for Python), or PeachPy (for x86 Asm), interacting with a precompiled engine: https://github.com/unum-cloud/usearch/blob/main/python/READM...
If I used cppyy in a project that I then made into a pip package, how would this affect distribution? It sounds like the end-user downloading the code would need a C++ compiler on their system, or does cppyy come with one?
AFAICT this does not quite produce true binaries, but rather interprets C++ via Cling, is that right? And the docs only offer that C++-like speeds are achieved for PyPy. If there are any performance benchmarks for CPython3, I can't see find them. Thats the real question - few people combine Python and C++ just for the fun of it.
EDIT some benchmarks are available in this paper, linked from TFA: https://wlav.web.cern.ch/wlav/Cppyy_LavrijsenDutta_PyHPC16.p... But they don't really answer my question. The benchmarks seem to mostly look at the overhead of wrapping C++, rather than comparing to a Python implementation. There is some I/O involved in some of them, which is maybe not so interesting, and some of the benchmarks don't even have a pure CPython implementation. Where they do, speed is very close. But then the paper is from 2018, a lot may have changed.
I suggest having a look at the pyproject and src/faebryk/core/cpp.
[0] https://github.com/wjakob/nanobind [1] https://github.com/atopile/atopile
this only goes so far - if you try to eg bind O(10k) methods using nanobind (or pybind ofc) you will be compiling for a very long time. for example, i have a threadripper and with a single large TU (translation unit) it took about 60 minutes (because single TU <-> single thread). i had to "shard" my nanobind source to get down to a "reasonable" ~10 minutes.
Thanks!
Then you sprinkle some cdef around etc and you get a bit faster again. You rewrite your algo a bit, so it's more "stateful C" style, which is not so much the Python way, and it gets a little faster. But not that much.
So then to make real gains you have to go into the weeds of what is going on. Look at the Cython bottlenecks, usually the spots where Cython has to revert to interacting with the Python interpreter. You may go down the rabbit holes of Cython directives, switching off things like overflow checks etc. IME this is a lot of trial and error and isn't always intuitive. All of this is done in a language that, by this point, is superficially similar to Python but might as well not be. The slowness comes no longer from algorithmic logic or Python semantics but from places where Cython escapes out to the Python interpreter.
At this point, C++ may offer a respite, if you are familiar with the language. Because performance tradeoffs are very obvious in code right in front of you. You get no head start in terms of Pythonic syntax, but otherwise you are writing pure C++ and its so much easier to reason with the performance.
I would imagine that very well written Cython is close in performance to C++ but for someone who knows a bit of C++ and only occasionally writes Cython, the former is much easier to make fast.
> AFAICT this does not quite produce true binaries, but rather interprets C++ via Cling, is that right?
yes but to be very clear: it's not designed to interpret arbitrary cpp but calls and ctors and field accesses. the point is binding. also it can use cling or clang-repl.
Cppyy – Automatic Python-C++ bindings - https://news.ycombinator.com/item?id=19848450 - May 2019 (22 comments)
[1] https://www.swig.org/