I realize I'm talking about C++ not C, but coincidentally just today I ported our 7 year old library's Swig/Python interface to nanobind. What a fragile c9k Swig has been all these years (don't touch it!) and the nanobind transformation is so refreshing and clean, and lots of type information suddenly available to Python programs. One day of effort and our tests all pass, and now nanobind seems able to allow us to improve the ergonomics (from the Python pov) of our lib.
rossant · 3h ago
My visualization library [1] is written in C and exposes a visualization API in C. It is packaged as a Python wheel using auto-generated ctypes bindings, which includes the shared library (so, dylib, or dll) and a few dependencies. This setup works very well, with no need to compile against each Python version. I only need to build it for the supported platforms, which is handled automatically by GitHub Actions. The library is designed to minimize the number of C calls, making the ctypes overhead negligible in practice.
This is one of the "killer apps" for Nim. Nim makes makes it easy to wrap C and easy to talk to Python (via Nimpy).
dexzod · 2h ago
The title of the article is misleading. Making C and python talk to each other implies, calling python from C and calling C from python. The article only covers the former.
eth_hack77 · 4h ago
Thanks a lot for the article. Here's a QQ: did you measure the time of some basic operations python vs C? (e.g. if I do a loop of 10 billion iterations, just dividing numbers in C and do the same in python, and then import these operations into one another as libraries, does anything change?)
I'm a beginner engineer so please don't judge me if my question is not making perfect sense.
bdbenton5255 · 4h ago
C is many magnitudes faster than Python and you can measure this using nested conditionals. Python is built for a higher level of abstraction and this comes at the cost of speed. It is what makes it very natural and human-like to write in.
xandrius · 4h ago
Syntax has nothing to do with the speed of the language: python could be "natural" and "human-like" while being much faster and also "unnatural" and "inhuman" while being slower.
bdbenton5255 · 3h ago
It does, actually, as the syntax is a result of the language's design and a simpler and more human-like syntax requires a higher level of abstraction that reduces efficiency.
The design of a language, including its syntax, has a great bearing on its speed and efficiency.
Compare C with Assembly, for example, and you will see that higher level languages take complex actions and simplify them into a more terse syntax.
You will also observe that languages such as Python are not nearly as suitable for lower level tasks like writing operating systems where C is much more suitable due to speed.
Languages like Python and Ruby include a higher level of built-in logic to make writing in them more natural and easy at the cost of efficiency.
johannes1234321 · 3h ago
Then let's look at C++, which in some areas has a higher abstraction level than C, but in some areas can still be faster than C. (Due to usage of templates, which then inline the library code, which then can be optimized on actual types, rather than using library functions which use void pointers, which will require a function call and have a not as optimized compiled form.
The main thing about python being slower is that in most contexts it is used as an interpreted/interpiled language running on its own VM in cpython.
throwaway314155 · 3h ago
Language abstractions that are not "zero-cost" inevitably lead to worse performance. Python has many such abstractions designed to improve developer experience. I think that's all the person you're responding to meant.
bdbenton5255 · 3h ago
Yes, thank you.
vlovich123 · 3h ago
It’s not mainly the syntax though although it has a marginal effect. It’s partially the lack of typing information (which is syntax) but mostly that it runs interpreted. Pypy is significantly faster because it applies JIT to generate machine code directly to represent the Python code and it’s significantly faster in most cases. Another huge cost is in multi-threaded scenarios where Python has the GIL (even in single threaded there’s a cost) which is an architectural and not syntactic decision.
For example, Python has a drastically simpler syntax in some ways than C++ (ignoring the borrow annotations). In many ways it can look like Python code. Yet its performance is the same as c++ because it’s AOT compiled and has an explicit type system to support that.
TLDR: most of python’s slowness is not syntactic but architectural design decisions about how to run the code which is why alternate Python implementations (IronPython, PyPy) typically run faster.
jebarker · 6h ago
Lots of people argue that AI R&D is currently done in Python because of the benefits of the rich library ecosystem. This makes me realize that's actually a poor reason for everything to be in Python since the actually useful libraries for things like visualization could easily be called from lower level languages if they're off the hot path.
bigger_cheese · 6m ago
I have been using Python recently and have found a lot of the data visualization tools seem to be wrappers around other languages (mostly JavaScript), things like, agGrid, Tabulator, Plotly etc.
Sometimes you end up embedding chunks of javascript directly inside your python
> could easily be called from lower level languages
Could? Yes. Easily? No.
People write their business logic in Python because they don't want to code in those lower-level languages unless they absolutely have to. The article neatly shows the kind of additional coding overhead you'd have to deal with - and you're not getting anything back in return.
Python is successful because it's a high-level language which has the right tooling to create easy-to-use wrappers around low-level high-performance libraries. You get all the benefits of a rich high-level language for the cold path, and you only pay a small penalty over using a low-level language for the hot path.
jebarker · 4h ago
The problem I see (day to day working on ML framework optimization) is that it's not just a case of python calling lower level compiled code. Pytorch, for example, has a much closer integration of python and the low level functions than that and it does cause performance bottlenecks. So in theory I agree that using high level languages to script calls to low level is a good idea, but in practice that gets abused to put python in the hot path. Perhaps if the lower level language were the bulk of the framework and just called python for helper functions we'd see better performance-aware design from developers.
sigbottle · 1h ago
What's the bottleneck? Is it serializing to/from pyobjects over and over for the mlops? I thought pytorch was pretty good with this: Tensors are views, the computation graph can be executed in parallel, & you're just calling a bunch of fast linear algebra libraries under the hood, etc.
If it avoids excessive copying & supports parallel computation, surely it's fine?
If your model is small enough where the overhead of python would start dominating the execution time, I mean... does performance even matter that much, then? And if it's large enough, surely the things I mentioned outweigh the costs?
yowlingcat · 4h ago
> but in practice that gets abused to put python in the hot path
But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?
jebarker · 3h ago
The problem is that python allows people to be lazy and ignore subtle performance issues. That's much harder in a lower level language. Obviously the tradeoff is that it'd slow down (or completely stop) some developers. I'm really just wondering out loud if the constraints of a lower level language would help people write better code in this case and whether that trade-off would be worth it
efavdb · 2h ago
FWIW I would be up to write in c or something else, but use python for the packages / network effects.
giancarlostoro · 6h ago
I think its more than just because of the available libraries. I think that industry has just predominantly preferred Python. Python is a really rich modern language, it might be quirky, but so is every single language you can name. Nothing is quite as quirky as JavaScript though, maybe VB6 but that's mostly dead, though slightly lingering.
Mind you I've programmed in all the mentioned languages. ;)
whattheheckheck · 6h ago
It's the ease of distribution of packages and big functionality being a pip install away
kstrauser · 5h ago
That's the killer feature. Whatever it is you want to do, there's almost certainly a package for it. The joke is that Python's the second best language for everything. It's not the best for web backends, but it's pretty great. It's not the best for data analysis, but it's pretty great. It's not the best at security tooling, but it's pretty great. And it probably is the best language for doing all three of those things in one project.
scj · 3h ago
Wouldn't it be nice if popular libraries could export to .so files so the best language for a task could use the bits & pieces it needed without a programmer needing to know python (and possibly C)?
Were I to write a scripting language, trivial export to .so files would be a primary design goal.
username223 · 3h ago
Unfortunately the calling conventions and memory models are all different, so there's usually hell to pay going between languages. Perl passes arguments on a stack, Lisp often uses tagged integers, Fortran stores matrices in the other order, ... it goes on and on. SWIG (https://swig.org) can help a lot, but it's still a pain.
username223 · 3h ago
Hah!
Ruby, Python, and Perl all had similarly good package ecosystems in the late 1990s, and I think any of them could have ended up as the dominant scripting language. Then Google chose Python as its main scripting language, invested hundreds of millions of dollars, and here we are. It's not as suitable as Matlab, R, or Julia for numerical work, but money made it good enough.
(Sort of like how Java and later JavaScript VMs came to dominate: you can always compensate for poor upfront design with enough after-the-fact money.)
kjellsbells · 1h ago
I think that gives Google too much credit (blame?). Perl, for example, started to become increasingly painful as the objects users wanted to manipulate outstripped the natural reach of the language (hence the infamous modem noise sigil pile up, @$[0]->\$foo@ etc). It also did not help that the Perl community took a ten year diversion into Perl6/Raku. Circa 2005, Python looked like a fresh drink compared to Perl.
kstrauser · 1h ago
Yep. CPAN was impressive in the late 90s. I loved writing Perl at the time, other than the sigil explosion. The first time I wrote some Python (“OMG, everything is a reference?!”) was just about the last time I ever wrote any new Perl code.
I made that switch before I’d ever heard of Google, or Ruby for that matter. My experience was quite common at the time.
wallunit · 5h ago
This is actually rather a reason to avoid Python in my opinion. You don't want pip to pollute your system with untracked files. There are tools like virtualenv to contain your Python dependencies but this isn't by default, and pip is generally rather primitive compared to npm.
bee_rider · 4h ago
Ubuntu complains now if you try to use pip outside a virtual environment… I think things are in a basically ok state as far as that goes.
Arguably it could be a little easier to automatically start up a virtual environment if you call pip outside of one… but, I dunno, default behavior that papers over too many errors is not great. If they don’t get a hard error, confused users might become even more confused when they don’t learn they need to load a virtual environment to get things working.
montebicyclelo · 4h ago
The industry standard has been Poetry for a good few years now, and UV is the newer exciting tool in this space. Both create universal lockfiles from more loosely specified dependencies in pyproject.toml resulting in reproducible environments across systems, (they create isolated Python environments per project).
I really hope we are at the end game with poetry or uv. I can't take it anymore.
ashishb · 5h ago
I rewrote a simple RAG ingestion pipeline from Python to Go.
It reads from a database.
Generates embeddings.
Writes it to a vector database.
- ~10X faster
- ~10X lower memory usage
The only problem is that you have to spend a lot of time figuring out how to do it.
All instructions on the Internet and even on the vector database documentation are in Python.
chpatrick · 4h ago
If speed and memory use aren't a bottleneck then "a lot of time figuring out how to do it" is probably the biggest cost for the company. Generally these things can be run offline and memory is fairly cheap. You can get a month of a machine with a ton of RAM for the equivalent of one hour of developer time of someone who knows how to do this. That's why Python is so popular.
kgeist · 3h ago
>I rewrote a simple RAG ingestion pipeline from Python to Go
I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.
For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too
ashishb · 3h ago
> I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too
There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.
kristjansson · 5h ago
One ... could? But it doesn't seem particularly ergonomic.
jebarker · 4h ago
Ergonomics isn't the point, performance is.
mkoubaa · 2h ago
Nobody has ever, in the history of Python, called the Python C API easy.
SandmanDP · 6h ago
I’ve been curious, what are the motivations for most projects to use Lua for enabling scripting in C over this? Is the concern around including an entire Python interpreter in a project and Lua is lighter?
crote · 5h ago
Lua is absolutely trivial to isolate. As the embedder, you have complete control over what the interpreter and VM are doing. Don't want your Lua scripts to have file access? Don't hook up those functions and you're done. Want to prevent against endless loops? Tell the VM to stop after 10.000 instructions. Want to limit the amount of memory a script can use? Absolutely trivial. This makes Lua very attractive for things like game development. You can run untrusted addon code without any worry that it'll be able to mess up the game - or the rest of the system.
Doing the same with Python is a lot harder. Python is designed first and foremost to run on its own. If you embed Python you are essentially running it besides your own code, with a bunch of hooks in both directions. Running hostile Python code? Probably not a good idea.
OskarS · 5h ago
Another thing to mention is that until very recently (Python 3.12, I think?) every interpreter in the address space shared a lot of global state, including most importantly the GIL. For my area (audio plugins) that made Python a non-starter for embedding, while Lua works great.
I agree though: biggest reason is probably the C API. Lua's is so simple to embed and to integrate with your code-base compared to Python. The language is also optimized for "quick compiling", and it's also very lightweight.
These days, however, one might argue that you gain so much from embedding either Python or JavaScript, it might be worth the extra pain on the C/C++ side.
bandoti · 6h ago
Lua is much lighter but the key is that it’s probably one of the easiest things to integrate (just copy the sources/includes and add them to build it’ll work)—like a “header only” kind of vibe.
But, you can strip down a minimal Python build and statically compile it without too much difficulty.
I tend to prefer Tcl because it has what I feel the perfect amount of functionality by default with a relatively small size. Tcl also has the better C APIs of the bunch if you’re working more in C.
Lua is very “pushy” and “poppy” due to its stack-based approach, but that can be fun too if you enjoy programming RPN calculators haha :)
spacechild1 · 4h ago
People already mentioned that Lua is very lightweight and easy to integrate. It's also significantly faster than Python. (I'm not even talking about LuaJIT.)
Another big reason: the Lua interpreter does not have any global variables (and therefore also no GIL) so you can have multiple interpreters that are completely independent from each other.
90s_dev · 6h ago
Network effect.
a_t48 · 6h ago
Useful, I’m going to be doing something similar w/C++ soon.
brcmthrowaway · 3h ago
How does this compare to pybind11?
nubinetwork · 3h ago
Isn't this the whole point to cffi and cython?
softwaredoug · 3h ago
Definitely though Cython is a layer of abstraction that might feel like Python has all kinds of weirdness you might as well write in a better understood language like C.
[1] https://datoviz.org/
This article is about embedding Python scripts inside a C codebase
[1] https://docs.python.org/3/extending/embedding.html
I'm a beginner engineer so please don't judge me if my question is not making perfect sense.
The design of a language, including its syntax, has a great bearing on its speed and efficiency.
Compare C with Assembly, for example, and you will see that higher level languages take complex actions and simplify them into a more terse syntax.
You will also observe that languages such as Python are not nearly as suitable for lower level tasks like writing operating systems where C is much more suitable due to speed.
Languages like Python and Ruby include a higher level of built-in logic to make writing in them more natural and easy at the cost of efficiency.
The main thing about python being slower is that in most contexts it is used as an interpreted/interpiled language running on its own VM in cpython.
For example, Python has a drastically simpler syntax in some ways than C++ (ignoring the borrow annotations). In many ways it can look like Python code. Yet its performance is the same as c++ because it’s AOT compiled and has an explicit type system to support that.
TLDR: most of python’s slowness is not syntactic but architectural design decisions about how to run the code which is why alternate Python implementations (IronPython, PyPy) typically run faster.
Sometimes you end up embedding chunks of javascript directly inside your python
For example the docs for Streamlit implementation of AgGrid contain this: https://staggrid-examples.streamlit.app/Advanced_config_and_...
Could? Yes. Easily? No.
People write their business logic in Python because they don't want to code in those lower-level languages unless they absolutely have to. The article neatly shows the kind of additional coding overhead you'd have to deal with - and you're not getting anything back in return.
Python is successful because it's a high-level language which has the right tooling to create easy-to-use wrappers around low-level high-performance libraries. You get all the benefits of a rich high-level language for the cold path, and you only pay a small penalty over using a low-level language for the hot path.
If it avoids excessive copying & supports parallel computation, surely it's fine?
If your model is small enough where the overhead of python would start dominating the execution time, I mean... does performance even matter that much, then? And if it's large enough, surely the things I mentioned outweigh the costs?
But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?
Mind you I've programmed in all the mentioned languages. ;)
Were I to write a scripting language, trivial export to .so files would be a primary design goal.
Ruby, Python, and Perl all had similarly good package ecosystems in the late 1990s, and I think any of them could have ended up as the dominant scripting language. Then Google chose Python as its main scripting language, invested hundreds of millions of dollars, and here we are. It's not as suitable as Matlab, R, or Julia for numerical work, but money made it good enough.
(Sort of like how Java and later JavaScript VMs came to dominate: you can always compensate for poor upfront design with enough after-the-fact money.)
I made that switch before I’d ever heard of Google, or Ruby for that matter. My experience was quite common at the time.
Arguably it could be a little easier to automatically start up a virtual environment if you call pip outside of one… but, I dunno, default behavior that papers over too many errors is not great. If they don’t get a hard error, confused users might become even more confused when they don’t learn they need to load a virtual environment to get things working.
I really hope we are at the end game with poetry or uv. I can't take it anymore.
It reads from a database. Generates embeddings. Writes it to a vector database.
The only problem is that you have to spend a lot of time figuring out how to do it.All instructions on the Internet and even on the vector database documentation are in Python.
I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.
For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too
There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.
Doing the same with Python is a lot harder. Python is designed first and foremost to run on its own. If you embed Python you are essentially running it besides your own code, with a bunch of hooks in both directions. Running hostile Python code? Probably not a good idea.
I agree though: biggest reason is probably the C API. Lua's is so simple to embed and to integrate with your code-base compared to Python. The language is also optimized for "quick compiling", and it's also very lightweight.
These days, however, one might argue that you gain so much from embedding either Python or JavaScript, it might be worth the extra pain on the C/C++ side.
But, you can strip down a minimal Python build and statically compile it without too much difficulty.
I tend to prefer Tcl because it has what I feel the perfect amount of functionality by default with a relatively small size. Tcl also has the better C APIs of the bunch if you’re working more in C.
Lua is very “pushy” and “poppy” due to its stack-based approach, but that can be fun too if you enjoy programming RPN calculators haha :)
Another big reason: the Lua interpreter does not have any global variables (and therefore also no GIL) so you can have multiple interpreters that are completely independent from each other.