Many ransomware strains will abort if they detect a Russian keyboard installed (2021) (krebsonsecurity.com)

Quite slow. It's largely due to the author using FPGAs wrong. Clocking down a 7-series Artix to 62.5 MHz means the design is not pipelined correctly/enough. My friend got 1 SHA256 hash per cycle at 300 MHz on 7 series, but slightly fewer of the design fit on a chip. Thruput would easily be in the GH/s range.

Keep in mind RTX4090 is 5 nm process node and has a lot more transistors and memory than XC7A100T, which is 28 nm. That's a huge difference in terms of dynamic performance. Also, the two are also released 10 years apart. If you compare RTX4090 against a similarly modern UltraScale part from Xilinx, I believe the FPGA can be notably faster than RTX4090.

benlivengood · 11h ago

I'm assuming this space has already been heavily optimized by the Bitcoin miners on their way to ASICs.

15155 · 10h ago

Yes, but a designed-for-FPGA SHA256 implementation looks very different than an ASIC SHA256 implementation - the ASIC has far greater routing flexibility and density, and can therefore use far more combinatorial logic between register stages.

(ASIC simulation on an FPGA will retain the combinatorial stages but run at dramatically lower fMax)

benlivengood · 4h ago

I should have been a little clearer. I meant that the miners spent a brief period optimizing FPGAs before they abandoned them entirely for ASICs, but during that brief period I'm guessing they squeezed as many hashes/watt out of the FPGAs as they could.

picture · 11h ago

Yes, hard silicon will be another magnitude more performant than FPGAs and GPUs, but ASICs properly take on negative value when they're no longer profitable to mine with. (Note that efficiency won't be much better at the same process node. You can just pump more power through each ASIC die)

Edit - I misread your comment. ASIC designers will use FPGAs to test their design but it won't be optimized for FPGAs which have a different logic-and-memory characteristic than ASICs. There aren't many great SHA256 FPGA implementations, largely because there's not that much demand for one

the8472 · 10h ago

> but ASICs properly take on negative value when they're no longer profitable to mine with

No matmul coin where the hardware could be repurposed for AI stuff?

15155 · 9h ago

Modern BTC ASICs consist of 1600-3200 SHA256 cores and only output nonces for sha256(sha256(btcBlockHeader)) - there's no memory or ability to obtain other output.

throwawaymaths · 8h ago

always thought it might be cool to repurpose fast double sha engines for error detection in storage arrays

throwawaymaths · 8h ago

matmul isn't a trapdoor function

Retr0id · 11h ago

Unfortunately I think most of that innovation happened behind closed doors, because everyone wanted to maintain their competitive advantages.

sMarsIntruder · 8h ago

Yes, ASICS are definitely very closed source for that specific reason.

15155 · 10h ago

SHA256 is extremely FF-heavy, you need around 200k for an optimized, unrolled, pipelined implementation.

UltraScale+ chips will run a proper design at 600MHz-800MHz, big chips might be able to fit 24 cores. The Artix chip OP used is extremely slow and too small to fit this style of implementation.

d00mB0t · 11h ago

More posts like this please! How about a crypto accelerator on FPGA that's integrated with OpenSSL?

15155 · 10h ago

Unless you're talking about niche algorithms (and even then), the FPGA will get smoked by a CPU for most common tasks one would use OpenSSL for.

d00mB0t · 8h ago

Yes--obviously modern CPUs have crypto extensions that would be faster than an FPGA,this would be for educational purposes.

15155 · 8h ago

Even without the extensions, by the time you've moved the workload to the FPGA and back, the CPU has already completed whatever operation your FPGA was going to complete with OpenSSL.

FPGA cryptographic acceleration is about batch task bandwidth, OpenSSL has few places where this is required.

toast0 · 7h ago

If you want to do crypto acceleration for TLS, there's two places to do it. Handshake/signature/key agreement, which could maybe work, but hasn't been the bottleneck in a long time, eliptic curve dramatically reduces the work for the server and most clients can do it; but maybe shipping the data around for that is fine.

The other part is bulk encryption. CPUs have lots of acceleration for that, but clear text is still faster, so the win is not to ship data to an accelerator and then back to the cpu and then out to the NIC, but to ship to the accelerator and from there to the NIC without touching the CPU or often the accelerator is integrated with the NIC.

It works even better if the data never has to touch the CPU.

15155 · 5h ago

Yes, this is why FPGAs are used as NICs in many situations, but the folks doing this are of course not using OpenSSL.

d00mB0t · 7h ago

You must be great to talk to at parties lol, I guess I shouldn't build a RISC-V CPU because Intel is faster?

15155 · 5h ago

You should definitely build a crypto accelerator - just don't integrate it into OpenSSL (painful codebase to work in, no speed benefit, etc.)

qdotme · 11h ago

Great job!

For alternative design/writeup, check out http://nsa.unaligned.org

projektfu · 7h ago

That seems to be the inverse function for SHA-1 and MD5.

bri3d · 4h ago

If you know the inverse function for SHA-1, that’s really quite something :)

That project is indeed SHA-1 and not SHA256, but the implementation is much more clever and did a very good job utilizing some very ancient FPGAs back in the day.

projektfu · 2h ago

True.

15155 · 12h ago

Now try a fully unrolled/pipelined design that emits one hash per clock cycle for actual parallelization.

m3kw9 · 11h ago

Or try hardcoding a few billion trillions of premade hashes

nayuki · 11h ago

https://en.wikipedia.org/wiki/Rainbow_table ?

m3kw9 · 54m ago

It would be called galaxy table

picture · 11h ago

I know why you're downvoted, but it's true, the author is not using FPGAs correctly.

Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights (jameshard.ing)

The Fed says this is a cube of $1M. They're off by half a million (calvin.sh)

IDF officers ordered to fire at unarmed crowds near Gaza food distribution sites (haaretz.com)

More on Apple's Trust-Eroding 'F1 the Movie' Wallet Ad (daringfireball.net)

The new skill in AI is not prompting, it's context engineering (philschmid.de)

JavaScript Trademark Update (deno.com)

MCP: An (Accidentally) Universal Plugin System (worksonmymachine.substack.com)

Engineered Addictions (masonyarbrough.substack.com)

Writing Code Was Never the Bottleneck (ordep.dev)

I made my VM think it has a CPU fan (wbenny.github.io)

Xfinity using WiFi signals in your house to detect motion (xfinity.com)

Proton joins suit against Apple for practices that harm developers and consumers (proton.me)

Introducing tmux-rs (richardscollin.github.io)

I built something that changed my friend group's social fabric (blog.danpetrolito.xyz)

I deleted my second brain (joanwestenberg.com)

Exploiting the IKKO Activebuds “AI powered” earbuds (2024) (blog.mgdproductions.com)

Cloudflare to introduce pay-per-crawl for AI bots (blog.cloudflare.com)

Websites hosting major US climate reports taken down (apnews.com)

Facebook is asking to use Meta AI on photos you haven’t yet shared (theverge.com)

Figma files for proposed IPO (figma.com)

Private sector lost 33k jobs, badly missing expectations of 100k increase (cnbc.com)

Don’t use “click here” as link text (2001) (w3.org)

There are no new ideas in AI, only new datasets (blog.jxmo.io)

YouTube No Translation (addons.mozilla.org)

ICEBlock, an app for anonymously reporting ICE sightings, goes viral (techcrunch.com)

Ask HN: What Are You Working On? (June 2025)

Gridfinity: The modular, open-source grid storage system (gridfinity.xyz)

US Supreme Court limits federal judges' power to block Trump orders (theguardian.com)

Cloudflare Introduces Default Blocking of A.I. Data Scrapers (nytimes.com)

Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite Webpages (simedw.com)

Many ransomware strains will abort if they detect a Russian keyboard installed (2021) (krebsonsecurity.com)

I write type-safe generic data structures in C (danielchasehooper.com)

I'm dialing back my LLM usage (zed.dev)

Show HN: CSS generator for a high-def glass effect (glass3d.dev)

Fakespot shuts down today after 9 years of detecting fake product reviews (blog.truestar.pro)

Introducing Gemma 3n (developers.googleblog.com)

Melbourne man discovers extensive model train network underneath house (sbs.com.au)

Alternative Layout System (alternativelayoutsystem.com)

XSLT – Native, zero-config build system for the Web (github.com)

Claude Code now supports hooks (docs.anthropic.com)

ICEBlock climbs to the top of the App Store charts after officials slam it (engadget.com)

OpenFLOW – Quickly make beautiful infrastructure diagrams local to your machine (github.com)

US economy shrank 0.5% in the first quarter, worse than earlier estimates (apnews.com)

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok (github.com)

Gene therapy restored hearing in deaf patients (news.ki.se)

JWST reveals its first direct image discovery of an exoplanet (smithsonianmag.com)

Sam Altman Slams Meta’s AI Talent Poaching: 'Missionaries Will Beat Mercenaries' (wired.com)

Huawei releases an open weight model trained on Huawei Ascend GPUs (arxiv.org)

The $25k car is going extinct? (media.hubspot.com)

Loss of key US satellite data could send hurricane forecasting back 'decades' (theguardian.com)

Parallelizing SHA256 Calculation on FPGA

Comments (31)