Limited-time GPU firepower Dirt-cheap LLM Inference: Llama 4, DeepSeek 0528

Comments (1)

dikobraz · 8h ago

We’ve got a temporarily underutilized 64 x AMD MI300X cluster, so instead of letting it sit idle, we’re opening it up for LLM inference.

Running: LLaMA 4 Maverick, DeepSeek V3, R1, and R1-0528. Want another open model? Let us know. We are happy to deploy it.

Prices are around 50% lower than the cheapest OpenRouter endpoints, and they will stay that way through June (maybe July).

The server handles up to 10,000 requests/sec, and we allocate GPUs per model based on demand. So feel free to load-test it, hammer it, or run production traffic. We're collecting no data whatsoever.

cloudrift.ai/inference

Full disclosure: I am the founder. We're trying to make good use of this capacity. Let us know if you have any ideas on how to utilize the cluster meaningfully. We're happy to hear the feedback.

My AI skeptic friends are all nuts (fly.io)

Self-hosting your own media considered harmful according to YouTube (jeffgeerling.com)

Bill Atkinson has died (daringfireball.net)

The time bomb in the tax code that's fueling mass tech layoffs (qz.com)

OpenAI slams court order to save all ChatGPT logs, including deleted chats (arstechnica.com)

Cloudlflare builds OAuth with Claude and publishes all the prompts (github.com)

FFmpeg merges WebRTC support (git.ffmpeg.org)

If you are useful, it doesn't mean you are valued (betterthanrandom.substack.com)

The last six months in LLMs, illustrated by pelicans on bicycles (simonwillison.net)

IRS Direct File on GitHub (chrisgiven.com)

A proposal to restrict sites from accessing a users’ local network (github.com)

How to post when no one is reading (jeetmehta.com)

Quarkdown: A modern Markdown-based typesetting system (github.com)

Merlin Bird ID (merlin.allaboutbirds.org)

Cursor 1.0 (cursor.com)

Why I wrote the BEAM book (happihacking.com)

Deep learning gets the glory, deep fact checking gets ignored (rachel.fast.ai)

How we decreased GitLab repo backup times from 48 hours to 41 minutes (about.gitlab.com)

The impossible predicament of the death newts (crookedtimber.org)

Covert web-to-app tracking via localhost on Android (localmess.github.io)

Meta: Shut down your invasive AI Discover feed (mozillafoundation.org)

Show HN: Kan.bn – An open-source alterative to Trello (github.com)

Tesla seeks to guard crash data from public disclosure (reuters.com)

EU Commission refuses to disclose authors behind its mass surveillance proposal (old.reddit.com)

Show HN: Air Lab – A portable and open air quality measuring device (networkedartifacts.com)

My experiment living in a tent in Hong Kong's jungle (corentin.trebaol.com)

Washington Post's Privacy Tip: Stop Using Chrome, Delete Meta Apps (and Yandex) (tech.slashdot.org)

Self-Host and Tech Independence: The Joy of Building Your Own (ssp.sh)

Prompt engineering playbook for programmers (addyo.substack.com)

Falsehoods programmers believe about aviation (flightaware.engineering)

Researchers develop ‘transparent paper’ as alternative to plastics (japannews.yomiuri.co.jp)

The Right to Repair Is Law in Washington State (eff.org)

Convert photos to Atkinson dithering (gazs.github.io)

The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] (ml-site.cdn-apple.com)

Google restricts Android sideloading (puri.sm)

Getting Past Procrastination (spectrum.ieee.org)

Joining Apple Computer (2018) (folklore.org)

(On | No) Syntactic Support for Error Handling (go.dev)

Ask HN: How do I learn robotics in 2025?

A year of funded FreeBSD development (daemonology.net)

DiffX – Next-Generation Extensible Diff Format (diffx.org)

Ask HN: Who is hiring? (June 2025)

Builder.ai Collapses: $1.5B 'AI' Startup Exposed as 'Indians'? (ibtimes.co.uk)

Cockatoos have learned to operate drinking fountains in Australia (science.org)

The iPhone 15 Pro’s Depth Maps (tech.marksblogg.com)

Autonomous drone defeats human champions in racing first (tudelft.nl)

How Ukraine’s killer drones are beating Russian jamming (spectrum.ieee.org)

Gemini-2.5-pro-preview-06-05 (deepmind.google)

Apple Notes Will Gain Markdown Export at WWDC, and, I Have Thoughts (daringfireball.net)

Cloud Run GPUs, now GA, makes running AI workloads easier for everyone (cloud.google.com)

Limited-time GPU firepower Dirt-cheap LLM Inference: Llama 4, DeepSeek 0528

Comments (1)