Ask HN: How to learn CUDA to professional level
69 upmind 28 6/8/2025, 10:52:35 AM
Hi all,
I was wondering what books/courses/projects one might do to learn CUDA programming.
(To be frank, the main reason is a lot of companies I'd wish to work for require CUDA experience -- this shouldn't change your answers hopefully, just wanted to provide some context )
CUDA itself is just a minor departure from C++, so the language itself is no big deal if you've used C++ before. But, if you're trying to get hired programming CUDA, what that really means is they want you implementing AI stuff (unless it's game dev). AI programming is a much wider and deeper subject than CUDA itself, so be ready to spend a bunch of time studying and hacking to come up to speed in that. But if you do, you will be in high demand. As mentioned, the fast.ai videos are a great introduction.
In the case of games, that means 3D graphics which these days is another rabbit hole. I knew a bit about this back in the day, but it is fantastically more sophisticated now and I don't have any idea where to even start.
I have two beginner (and probably very dumb) questions, why do they have heavy c++/cuda usage rather than using only pytorch/tensorflow. Are they too slow for training Leela? Second, why is there tensorflow code?
https://www.gpumode.com/ resources and discord community Book: Programming massively parallel processors nvidia cuda docs are very comprehensive too https://github.com/srush/GPU-Puzzles
1. Learning CUDA - the framework, libraries and high-layer wrappers. This is something that changes with times and trends.
2. Learning high-performance computing approaches. While a GPU and the Nvlink interfaces are Nvidia specific, working in a massively-parallel distributed computing environment is a general branch of knowledge that is translatable across HPC architectures.
3. Application specifics. If your thing is Transformers, you may just as well start from Torch, Tensorflow, etc. and rely on the current high-level abstractions, to inspire your learning down to the fundamentals.
I’m no longer active in any of the above, so I can’t be more specific, but if you want to master CUDA, I would say learning how massive-parallel programming works, is the foundation that may translate into transferable skills.
These sort of books are only "dated" when it comes to specific languages/frameworks/libraries. The methods/techniques are evergreen and often conceptually better explained in these older books.
For recent up to date works on HPC, the free multi-volume The Art of High Performance Computing by Victor Eijkhout can't be beat - https://news.ycombinator.com/item?id=38815334
Disclaime: I don't claim that this is actually a systematic way to learn it and it is more for academic work.
I got assigned to a project that needs learning CUDA as part of my PhD. There was no one in my research group who have any experience or know CUDA. I started with standard NVIDIA courses (Getting Started with Accelerated Computing with CUDA C/C++ and there is python version too).
This gave me good introduction to the concepts and basic ideas but I think after that I did most of learning by trial and error. I tried a couple of online tutorials for specific things and some books but it was always a deprecated function there or here or a change of API that make things obsolete. Or basically things changed for your GPU and now you have to be careful because yoy might be using GPU version not compatible with what I develop for in production and you need things to work for both.
I think learning CUDA for me is an endeavor of pain and going through "compute-sanitizer" and Nsight because you will find that most of your time will go into debugging why things is running slower than you think.
Take things slowly. Take a simple project that you know how to do without CUDA then port it to CUDA ane benchmark against CPU and try to optimize different aspect of it.
The one advice that can be helpful is not to think about optimization at the beginning. Start with correct, then optimize. A working slow kernel beats a fast kernel that corrupts memory.
Overall, it will be some pain for sure. And to master it including PTX etc. will take a lot of time.
This is so true it hurts.
I found it easy to start. Then there was a pretty nice learning curve to get to warps, SM's and basic concepts. Then I was able to dig deeper into the integer opcodes, which was super cool. I was able to optimize the compute part pretty well, without much roadblocks.
However, getting memory loads perfect and then getting closer to hw (warp groups, divergence, the L2 cache split thing, scheduling), was pretty hard.
I'd say CUDA is pretty nice/fun to start with, and it's possible to get quite far for a novice programmer. However getting deeper and achieving real advantage over CPU is hard.
Additionally there is a problem with Nvidia segmenting the market - some opcodes are present in _old_ gpu's (CUDA arch is _not_ forwards compatible). Some opcodes are reserved to "AI" chips (like H100). So, to get code that is fast on both H100 and RTX5090 is super hard. Add to that a fact that each card has different SM count and memory capacity and bandwidth... and you end up with an impossible compatibility matrix.
TLDR: Beginnings are nice and fun. You can get quite far on the optimizing compute part. But getting compatibility for differnt chips and memory access is hard. When you start, chose specific problem, specific chip, specific instruction set.
Scientific Parallel Computing by L. Ridgway Scott et. al. - https://press.princeton.edu/books/hardcover/9780691119359/sc...
Make sure you are very clear on what you want. Most HR departments cast a wide net (it's like how every junior role requires "3-5 years of experience" when in reality they don't really care). Similarly when hiring, most companies pray for the unicorn developer who can understand the entire stack from the GPU to the end user product domain when the day to day is mostly in Python.
Start writing some CUDA core to sort an array or find the maximum element.
If that is not an option, I'll wait!