Advice to Tenstorrent

77 lexoj 62 5/25/2025, 9:37:59 PM github.com ↗

Comments (62)

samsartor · 1d ago

I'm doing my PhD in ML shit. Before that I was a systems programming guy, lots of C++, bit of CUDA, big fan of Rust. On the side I'm obsessed with RISC-V. Own a couple of boards. I made a stupid little cuda-like-compiler on top of the RISC-V vector extensions, just for fun.

What I'm saying is, tensorrent couldn't find a more excitable third-party developer if they grew one in a lab. And you know what? I can't make heads or tails out of all their various abstractions. I've tried! I've read the docs, I've read the examples, I've gone to meetups. I think OP is right that "one more abstraction bro" probably doesn't solve the problem.

At a guess, the problem isn't a technical one, it is an organizational one. They don't have anybody to stand in for me, or devs like me (eg dumb people). There is no product leadership on the API design. Just a lot of really brilliant engineers obsessively tuning for their own usecases, unwilling to ever trade-off a hit in performance or expressivity for readability or writeability.

1024bees · 1d ago

There is a natural tension between developing an API that is nice to use and having a full fledged graph compiler. Most graph compilers, and the hardware that requires them will be complex and difficult to approach. The "original sin" was pytorch vs tensorflow -- tensorflow capturing the entire graph and then compiling it with XLA (or whatever it was before, I'm probably mixing up tf1 and tf2 here) was such an intractable mess to actually hack on (also the runtime had unapproachable complexity, from what I recall). This has probably changed, but pytorch won out because it was both nice to use and develop.

There are clear reasons why a hardware company would use a graph compiler -- they think such an approach is higher performance, and makes tenstorrent look better on price per dollar when compared to competitors (read: nvda).

There is some legitimate criticism of TT here, their hardware is composed or simple blocks that compose into a complex system (5 separate CPUs being programmed per tensix tile, many tiles per chip), and that complexity has to be wrangled in the software stack -- paying that complexity in hardware so there is less of a VLIW model in software might remove a few abstractions.

liaopeiyuan · 1d ago

This is my sentiment too after trying to get a Blackhole to run a recent VLM (like Pixtral) over the weekend. Not just unit tests, but actual training loops. I write a lot of JAX in my day job to train large models but I used to do a bit of ML compiler development, which I guess also puts me in the dumb people crowd. I'm equally impressed by how smooth the lower-level setup is and frustrated by how little progress I was able to make towards the seemingly last mile of "just rewrite the code a little bit more bro I just need to get rid of this one hlo op because it's not supported."

I don't think anyone is seriously training an NN on TT hardware at the moment and I think that's an issue. I think tinygrad works not only because geohot is one hell of an engineer but also because comma dogfoods it. TT's engineers are absolutely brilliant (from reading their commits) but I think they are stretched too thin. Bounties are not gonna work - you can't expect an outsider with no internal access/bandwidth/knowledge to suddenly make e.g. Mixtral work as the issue spans at least across tt-xla/tt-mlir. And to agree with ^ training is a kind of artifact where good CX can only be derived from strong leadership and a leaner view of the stack. NVIDIA accumulated that over the decades and the rest are trying to catch up by aggressive hiring (not to say that hiring is necessary). e.g. Annapurna has a presence on the CMU campus when I was there and has the Anthropic team to test it out.

I'm an incredibly excited third-party developer as I think the pitch appeals a lot to grad students (who do model research) who need to run small experiments within the 13B range and reasonably scale them up to draw the first half of the scaling curve.

I lose too much productivity to abstractions and incomplete e2e support in TT's current shape. I'd love to give it another go in 6 months.

mlazos · 1d ago

I’m amazed this is even viewed as a “hot take” tbh most of what he said here is pretty high level of abstraction and standard practice for custom hardware. In essence I feel like he’s saying nothing really controversial other than publicly calling out TT for too many abstraction layers (and tbh it’s just in a readme). This is completely fine, he’s a user and this is his experience.

I’m a dev working on torch.compile at meta (previously I worked on ML focused FPGAs) and the approach I would use is build a static graph compiler, use torch.compile (and probably JAX) as graph extraction front-ends and call it a day. I feel like hardware companies don’t know how to handle the flexibility of PyTorch and as a result develop their own APIs which is mistake #1 and virtually makes it impossible to get any market penetration once you head down that path because nobody will ever ever rewrite their models for your hardware when they don’t even know what perf they will get, the risk is just too high. As a result, hardware companies offer inference APIs which hide all of this behind a REST API to basically paper over the lack of generality of the software/hardware interface. This is convenient because then nobody actually knows the perf/$ and they can burn VC money for as long as they want. Whether this is a viable business model or not, we will have to wait until they go public to actually see what their true inference costs are.

To sum it up, start from PyTorch and work your way down to your hardware, this is the only general way if you want to actually sell chips and not just constantly port the model of the day to your hardware.

IncreasePosts · 1d ago

I thought George was going to save AMD. Now he's saving tenstorrent? Busy guy!

bn-l · 1d ago

Must we snark on every single ducking post? It makes reading these comments a bit exhausting.

add-sub-mul-div · 1d ago

Don't forget he was also going to save Twitter but noped out after a few weeks.

henning · 1d ago

Maintaining and improving existing software sure is boring and often thankless compared to starting flashy new projects where you get to make and understand all the major decisions up front.

Aurornis · 1d ago

> Maintaining and improving existing software sure is boring and often thankless

There was a big public episode where he appealed to Elon Musk for a job at Twitter, was given an “internship”, tried soliciting public submissions for the code he was tasked with, left a lot of people shocked that he was struggled with basic FE work, and then resigned 4 weeks in: https://news.ycombinator.com/item?id=34074344

imtringued · 1d ago

I think he is right about AMD but completely misses the mark when it comes to tenstorrent. He is ranting about exponential linear unit (elu), which hardly seems to be something that could possibly hold an AI company back. If the hardware is running and training models reliably, then it's just a matter of pricing to stay competitive. Poor optimization cuts your margins, but the incentives are aligned.

With AMD the experience is so poor that you have to save the company from itself if you want to make progress.

latchkey · 1d ago

People refuse to believe it, but the AMD experience is getting better every day by leaps and bounds. Over the last few months, there is a brand new focus on improving the software. There is still a long way to go, but the company is absolutely trying to save itself.

georgehotz · 1d ago

AMD has legitimately been making great progress. They still have a long way to go, and I appreciate SemiAnalysis taking up the mantle of calling them out, but I ran:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3

today on stream and it just worked. No external ROCm install, and just the amdgpu driver that's in Arch.

We also have our own complete driver/runtime now in tinygrad; it's so much nicer to build on a foundation when you can blame yourself for the bugs.

latchkey · 1d ago

Regarding SA, I’m all for holding AMD accountable, but let’s at least get the facts right, and maybe don’t come at it with a history of cheerleading for Nvidia.

Maybe SA might set their sights on you next?

Aurornis · 1d ago

geohot got a lot of press and attention by coming out aggressively at AMD during a moment when their software really was weak.

The short-term payoff of that drama created a long-term problem where the only way they could look good was by outrunning the progress of the engineers who were inside the company, well funded, and already familiar with everything. It was an impossible goal from the start but he made it even more impossible by attacking AMD. AMD was smart to basically ignore them and wait for him to give up, as opposed to inviting that drama to crossover in-house or split their user base.

latchkey · 1d ago

They didn't ignore him and he didn't give up. They gave him two boxes and he added MI300x support to tinygrad.

georgehotz · 1d ago

We got the MI300X box on MLPerf too, and every MLPerf from here on general tinygrad improvements should bring down the times. We're still quite focused on AMD.

Like it's strange people think I give up on things, I think they listen to the media too much. This is a 2+ year long project that I've worked on almost every day. https://geohot.github.io/blog/jekyll/update/2023/05/24/the-t...

modeless · 1d ago

He's not ranting about ELU. He's using ELU as an example of something that shouldn't be in the lower abstraction layers.

moralestapia · 1d ago

geohot has a loud mouth ... but he has earned the cred and walks the walk.

I wish there was a thousand more geohots than all the mediocre middle-managers at AMD or tenstorrent; or people who have never done anything beyond posting snarky comments in online forums.

Aurornis · 1d ago

> geohot has a loud mouth ... but he has earned the cred

Sadly, I think geohot is an example of someone who earned some cred for impressive accomplishments in the past and then tried to cash in that cred over and over again in unrelated future domains.

His brief and very public flame out at Twitter after mysteriously abandoning another project and the bold claims about his AMD work that never really translated to anything have really detracted from whatever past “cred” he built up. I really hope he can find a new niche and succeed, but until then it might be time to lie low on social media and avoid throwing more mud.

moralestapia · 1d ago

>geohot has a loud mouth ... but he has earned the cred

>I think geohot is an example of someone who earned some cred for impressive accomplishments in the past [...]

Huh? Yeah, that's what I wrote.

Aurornis · 1d ago

> Huh? Yeah, that's what I wrote.

You cut out the part of my post where I made my point.

Earning “cred” for past accomplishments doesn’t give someone a free pass forever to be a loud mouth.

moralestapia · 1d ago

Then I don't get the point you're trying to make.

>Earning “cred” for past accomplishments doesn’t give someone a free pass forever to be a loud mouth.

What the ... ? Sorry, but you don't have to earn that. I suggest you familiarize yourself with human rights, particularly if you won the lottery and were born in a country like the US; it is very clearly written, in unambiguous terms, the rights their citizens enjoy.

And still, I'd rather listen to geohot rambling about whatever he wants to say, than some randos on the internet who have accomplished nothing but "arguing". Respect is not a right, it is earned, he's earned mine.

Aurornis · 1d ago

> What the ... ? Sorry, but you don't have to earn that. I suggest you familiarize yourself with human rights, particularly if you won the lottery and were born in a country like the US; it is very clearly written, in unambiguous terms, the rights their citizens enjoy.

This is obtuse. I obviously wasn’t talking about human rights or the 1st amendment.

I was saying it’s illogical to suggest that because someone did one impressive thing in another domain a long time ago, we should therefore continue to value their input on every topic forever.

catgary · 20h ago

Yeah, the dude has managed to catch nobelitis for the amazing achievement of…jailbreaking iOS as a teenager.

moralestapia · 18h ago

>jailbreaking iOS as a teenager

Let's see your code.

comma.ai is an impressive achievement on its own!

catgary · 1d ago

I don’t know what you’re talking about here. You can run your mouth as much as you like…you just become a laughingstock at a certain point. Nobody is calling for geohot to be silenced.

lostmsu · 1d ago

Really you shouldn't need to know who is saying what. Words have merit, people can only have trust.

moralestapia · 1d ago

That's true!

runlevel1 · 1d ago

> but he has earned the cred and walks the walk.

One of the lessons I wish I'd learned earlier is that indulging bombastic behavior neither benefits them nor you.

Strong opinions, weakly held are good. Well-reasoned opinions to the contrary are even encouraged. Provided, of course, that they're smart enough to know when to persist and when to disagree and commit.

If you just indulge it, they end up with the engineering equivalent of Nobelitis. And if they're on your team, you end up with more burden than asset no matter how brilliant they are.

throwaway314155 · 1d ago

Guy would really benefit from learning some manners. Just comes across as painfully toxic no matter how correct he is.

edit: For what it's worth, if you can't see that this language is rude or think it is somehow acceptable for people of a certain caliber to talk this way - you're also probably toxic.

throwawaythekey · 1d ago

Not trying to flame or anything but what is your age?

As someone in my early 30's who grew up on message boards/gaming the language he is using is fairly mild. I think we just have very different social norms.

Aurornis · 1d ago

> As someone in my early 30's who grew up on message boards/gaming

That’s an extremely low bar. Nobody would bat an eye about a random person speaking like this on a gaming Discord or a message board.

Posting public messages as a public figure with an audience to a company is not the same as Call of Duty voice chat.

throwawaythekey · 18h ago

Yes and to me it is a fine tone for all of 'hacker' culture if you will. Perhaps not if you were giving a presentation to your boss, but as this is a message between peers it seems fine.

If it were on x it would seem fine.

Havoc · 1d ago

Well he called sony fudge packers in a youtube rap dis track after they sued him so I wouldn't hold my breath on that one

https://www.youtube.com/watch?v=9iUvuaChDEg

sothatsit · 1d ago

> edit: For what it's worth, if you can't see that this language is rude or think it is somehow acceptable for people of a certain caliber to talk this way - you're also probably toxic.

Personally, I prefer direct language if it gets to the root of the problem quicker. It's more pragmatic. You just have to pick your audience, because some people get offended by it. But the most productive discussions I've had have been arguments where you can both quickly find the holes in each other's positions, and then move forwards from there. As long as no one is taking it personally, this is very effective.

OTOH, I've been in many meetings where people talk around a problem for an hour, never reaching the conflict about what their disagreement actually is. To me, that is much more frustrating than someone risking offending someone by being direct. But it really depends upon the people you work with and the team you have.

yallpendantools · 1d ago

There's a line between rude/toxic and direct/pragmatic. The whole world at large would be much better off if we made that distinction. Also, that both tones can be found in the same piece of prose; we don't have to label a whole piece as one or the other.

> If you want a dataflow graph compiler, build a dataflow graph compiler. > This is not 6 layers of abstraction, it's 3 (and only 2 you have to build).

This is direct and pragmatic. It states the writer's justified true beliefs and opinions as plainly as possible.

> Plz bro one more stack this stack will be good i promise > bro bro bro plz one more make it all back one trade type beat

This is just toxic. The writer is making assumptions about other people's position that he does not (and probably could not) substantiate.

Ironically, sprinkling in toxic comments and back-handed insults in any piece has the effect of making said piece less direct and pragmatic.

sothatsit · 1d ago

True, but I just fear that people tend to throw the baby out with the bath water, opting for safe rather than effective communication.

Only like 4 lines of what geohot said are problematic. Would it be better if they weren't there? Yes. But, it's also not as egregious as you and that throwaway account make it out to be. It's more just annoying.

vasco · 1d ago

> you can't see that this language is rude or think it is somehow acceptable for people of a certain caliber to talk this way

Who's the people of a certain caliber? Are there people who can talk like this and others that are better people which shouldn't? What weird thing to say.

throwaway314155 · 1d ago

People hold geohot in high esteem (caliber). I'm saying that doesn't excuse the behavior. I think we're in agreement.

bn-l · 1d ago

He doesth come acrosth as tothic.

Yucky!

Georgie hotsth u r yucky and tocthic!

throwaway314155 · 17h ago

Guess I'm too much of a "limp wrist milquetoast" for your tastes. Good luck with all that.

gsf_emergency · 1d ago

Geohot is the voodoo medication I (secretly) take everyday

bigyabai · 1d ago

With all due respect Mr. Geohot, you've got an awful lotta nerve to throw stones from your glass house. This guy needs to drop the middle school tough-guy tone and post numbers. The central thesis of this article is sound, lead with it:

> You aren't going to get better deals on tapeouts/IP than NVIDIA/AMD. You need some advantage.

> If you want a dataflow graph compiler, build a dataflow graph compiler.

Now explain why. Clearly Tenstorrent is happy to build Yet Another Abstraction Layer, so instead of bullying them over it you should at least attempt to actively humiliate them for the approach. You know, produce some manner of evidence that vindicates your position instead of relying on your authority alone. Jim Keller has no reason to take this seriously, even if you're right.

Without any numbers this feels like one cult of personality trying to bait another into a shit-flinging contest as a marketing scheme. We've seen this happen several times before on Hacker News, and it doesn't end up with either side making an Nvidia-killer. This is not a model for productive discourse.

RealityVoid · 1d ago

I think you're right, mostly, but... While I'm sure Jim Keller will make great silicon, I'm not sure how good he's going to be at shaping the SW platform thing. I hope he will, I'm rooting for him, but I feel that this might be a novel challenge for him.

Geohot is abrasive to say the least, and, no, this is not a model for productive discourse(I'll try not to bring in some of his hot takes on the stream because giving them stage is probably also not productive) But I do think he has good taste in SW and he might be right about the number of layers of abstraction.

For context, geohot wrote this live on a twitch stream.

solarpunk · 1d ago

prior to the elections here in the us he spent a few minutes of his streams every few days talking about how stable trump was gonna be for business and how democrats pitching capital gains taxes was a nonstarter.

Havoc · 1d ago

>This guy needs to drop the middle school tough-guy tone and post numbers.

Pretty sure comma is profitable? Not particularly, but for a hardware startup selling multiple iterations and not getting wrecked is a sound start

catgary · 1d ago

Wasn’t he pushed out?

Havoc · 1d ago

Definitely not. Pretty sure he has full control...don't think even took outside capital but not 100% certain on that one

ac29 · 1d ago

As near as I can tell from a brief search, geohot left comma in 2022

htrp · 1d ago

Has geohot done anything since the original iphone jailbreak?

The ventures he has started (I can think of tinygrad and comma ai) all seem like half finished tech demos.

roenxi · 1d ago

How many times does he have to make international news before he is qualified to let off steam in a github README? He owns a company that works in this space and his opinions usually offer insights into the ML hardware world. Although he is a little bombastic and his grammar is vulnerable to criticism.

His AMD rants were a valuable warning about the quality of their hardware. I wish he'd done that maybe 10 years ago when I was buying AMD cards thinking that they might work with pytorch in a year or so. I knew they had problems but if I'd realised how bad the situation was I'd have held my nose and gone with Nvidia.

Aurornis · 1d ago

> His AMD rants were a valuable warning about the quality of their hardware.

The rants weren’t breaking news to anyone who was familiar with PyTorch or adjacent communities. He seized upon a weak moment for AMD to try to launch his own company. Unfortunately he launched his effort with an attack on the company he was effectively trying to partner with, making the entire venture DOA.

It’s too bad, too, because it would have been interesting to see if anything could have been accomplished with a more friendly offer of cooperation. He’s obviously talented as a developer, but effectively going on the attack for the company that forms the foundation of the business you’re trying to build is obviously not going to end well.

roenxi · 1d ago

> The rants weren’t breaking news to anyone who was familiar with PyTorch or adjacent communities.

How many people in the PyTorch and adjacent communities are trying to port the stack to AMD cards? I'd guess less than 100. That is a pretty small community. They don't do that much in the way of publicity (and, case in point, George does and he attracts a certain number of haters).

George is probably the first public reference I saw who wasn't pointing at CUDA as the problem but tagging kernel & hardware bugs on AMD cards as the blocker. Those are really different things; CUDA isn't that complicated to implement as a half-baked thing. I can implement naive matrix multiplication if I need to and it'd be a fun month of reading academic papers to learn better techniques. But that does nothing to work around hardware and firmware bugs which are an order of magnitude harder to deal with and not something I want to deal with.

I'm happy to buy in to bad software libraries. I don't want a bar of hardware bugs.

saagarjha · 1d ago

A lot of the people have been specifically not working on it because they felt that it was not a good use of their time.

throwawaythekey · 1d ago

I'm thinking about him every day lately due to a new set of AMD GPU drivers which has stopped my pc from being able to reliably wake from sleep.

coolThingsFirst · 1d ago

> his grammar is vulnerable to criticism

He definitely writes in a below-HS level.

empiricus · 1d ago

Well, at the minimum he writes more code and is more competent than me, so he has my respect and I take his opinions into consideration.

moralestapia · 1d ago

He's the founder of comma.ai

Edit: you edited your comment after I told you he made comma.ai.

Bit dishonest, but whatever, I wouldn't describe comma.ai as a "half finished tech demo" but you're allowed to your own opinion about it.

Nk26 · 1d ago

Comma.ai is the best tech I've bought in the last 3 or 4 years at least. Ive put thousands of miles into it and I've never had any issues.

bn-l · 1d ago

What a bunch of absolutely limp wrist milquetoasts hackernews is / has become.

So much pearl clutching over the “tone”. Oh dear.

I thought this polemic was amusing and am sure comes from a place of genuine concern.

yepyip · 10h ago

Exactly. I do not remember this site being so delusional. Everyday it is becoming more an example of a failed institution. This is how mind viruses work.

coolThingsFirst · 1d ago

This guy is insufferable. He failed his internship at Twitter and was asking questions about it publicly but has a strong opinion on everything and is an expert on everything tech related.