If you're going to the effort of writing a procmacro, you may as well output a string from the macro instead of code.
If you're going idiomatic rust, then you might instead output a type that has a display impl rather than generating code that writes to stdout.
jasonjmcghee · 2h ago
Reminds me of the famous thread on stack overflow. I'll link the rust one directly, but one cpp answer claims 283 GB/s - and others are in the ballpark of 50GB/s.
You can take this much further! I think throughput is a great way to measure it.
Things like pre-allocation, no branching, constants, simd, etc
ainiriand · 1h ago
In my opinion a more accurate measure when you go down to the micro seconds level is TSC directly from the CPU. I've built a benchmark tool for that https://github.com/sh4ka/hft-benchmarks
Also I think that CPU pining could help in this context but perhaps I need to check the code in my machine first.
hyperhello · 2h ago
Maybe I’m missing something but can’t you unroll it very easily by 15 prints at a time? That would skip the modulo checks entirely, and you could actually cache everything but the last two or three digits.
Terretta · 1h ago
> Maybe I’m missing something but can’t you unroll it very easily by 15...
Sure, 3 x 5 = 15. But, FTA:
But then, by coincidence, I watched an old Prime video and decided to put the question to him: how would you extend this to 7 = "Baz"?
He expanded the if-else chain: I asked him to find a way to do it without explosively increasing the number of necessary checks with each new term added. After some hints and more discussion...
Which is why I respectfully submit almost all examples of FizzBuzz including the article's first are "wrong" while the refactor is "right".
As for the optimizations, they don't focus on only 3 and 5, they include 7 throughout.
If you're going idiomatic rust, then you might instead output a type that has a display impl rather than generating code that writes to stdout.
The rust one claims around 3GB/s
https://codegolf.stackexchange.com/a/217455
You can take this much further! I think throughput is a great way to measure it.
Things like pre-allocation, no branching, constants, simd, etc
Also I think that CPU pining could help in this context but perhaps I need to check the code in my machine first.
Sure, 3 x 5 = 15. But, FTA:
But then, by coincidence, I watched an old Prime video and decided to put the question to him: how would you extend this to 7 = "Baz"?
He expanded the if-else chain: I asked him to find a way to do it without explosively increasing the number of necessary checks with each new term added. After some hints and more discussion...
Which is why I respectfully submit almost all examples of FizzBuzz including the article's first are "wrong" while the refactor is "right".
As for the optimizations, they don't focus on only 3 and 5, they include 7 throughout.