Up and running–first room-temperature quantum accelerator of its kind in Europe (nachrichten.idw-online.de)

Pass the choices through, please. It's so context dependent that I want a <dumber> and a <smarter> button, with units of $/M tokens. And another setting to send a particular prompt to "[x] batch" and email me with the answer later. For most things I'll start dumb and fast, but switch to smart and slow when the going gets rough.

ramesh31 · 11h ago

Flash is just so obscenely cheap at this point it's hard to justify the headache of self hosting though. Really only applies to sensitive data IMO.

xfalcox · 6h ago

You'd be surprised how often people in enterprise can be left waiting months to get an API key approved for an LLM provider.

cortesoft · 4h ago

There is a wide range of opinions on what should be considered sensitive data. Many people would classify a vast majority of their data as sensitive.

mkl · 8h ago

With tools like Ollama, self-hosting is easier than hosted. No sign-up, no API keys, no permission to spend money, no worries about data security, just an easy install then import a Python library. Qwen2.5-VL 7B is proving useful even on a work laptop with insufficient VRAM - I just leave it running over a night or weekend and it's saving me dozens of hours of work (that I then get to spend on other higher-value work).

genewitch · 7h ago

I got the 70b qwen llama distill, I have 24GB of vram.

I opened aider and gave a small prompt, roughly:

  Implement a JavaScript 2048 game that exists as flat file(s) and does not require a server, just the game HTML, CSS, and js. Make it compatible with firefox, at least.

That's it. Several hours later, it finished. The game ran. It was worth it because this was in the winter and it heated my house a bit, yay. I think the resulting 1-shot output is on my github.

I know it was in the training set, etc, but I wanted to see how big of a hassle it was, if it would 1-shot with such a small prompt, how long it would take.

Makes me want to try deepseek 671B, but I don't have any machines with >1TB of memory.

I do take donations of hardware.

mgraczyk · 8h ago

It does not take dozens of hours to get an API key for gemini

cortesoft · 4h ago

They weren’t saying getting the api key would take that long, just getting permission from their company to let them do it.

mkl · 8h ago

I never claimed that it did. Gemini would probably save me the same dozens of hours, but come with ongoing costs and additional starting up hurdles (some near insurmountable in my organisation, like data security for some of what I'm doing).

shmoogy · 7h ago

Gemini flash or any free LLM on openrouter would be orders of magnitude faster and effectively free. Unless you are concerned about privacy of the conversation - it's really purely being able to say you did it locally.

I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.

jacob019 · 10h ago

That's true for Flash 2.0 at $0.40/mtok output. GPT-4.1-nano is the same price and also surprisingly capable. I can spend real money with 2.5 flash, with those $3.50/mtok thinking tokens, worth it though. OP is an inference provider, so there may be some bias. Open source can't compete on context length either, nothing touches 2.5 flash for the price with long context--I've experimented with this a lot for my agentic pricing system. Open source models are improving, but they aren't really any cheaper right now, R1 for example does quite well performance wise, but it uses a LOT of tokens to get there, further limiting the shorter context window. There's still value in the open source models, each model has unique strengths and they're advancing quickly, but the frontier labs are moving fast too and have very compelling "workhorse" offers.

behnamoh · 11h ago

You're getting downvoted but what you said is true. The cost of self-hosting (and achieving +70 tok/sec consistently across the entire context window) has never been low enough to justify open source as a viable competitor to proprietary models of OpenAI, Google, and Anthropic.

grepfru_it · 8h ago

I am curious the need for 70 t/sec?

Aeolun · 7h ago

Waiting minutes for your call to succeed is too frustrating?

ekianjo · 3h ago

Depends entirely on the use case. Not every LLM workflow is a chatbot

A tool for burning visible pictures on a compact disc surface (github.com)

Film Packaging Archive (fp-archive.com)

Prophet: Automatic Forecasting Procedure (2023) (github.com)

Best place for small remote gigs?

Wish you weren't here? Why people of all ages want to leave the UK (ft.com)

Mind the Trust Gap: Fast, Private Local-to-Cloud LLM Chat (hazyresearch.stanford.edu)

OpenAI Operator vs. Claude Computer Use: The Definitive 2025 Comparison (agentrank.tech)

How old are the Dead Sea Scrolls? An AI model can help (economist.com)

Landscapes that enhance natural sounds and minimise noise pollution (theconversation.com)

Cities around the world are sinking at "worrying speed" (bbc.co.uk)

IBM Quantum Learning (learning.quantum.ibm.com)

How Red Hat just transformed enterprise server Linux (zdnet.com)

iFixit says the Switch 2 is even harder to repair than the original (theverge.com)

Show HN: Swagger-RAG – Search Swagger API Docs with LLM (github.com)

Japan Create Plastic That Dissolves in Seawater Within Hours (seasia.co)

Correlation between physical fitness and risk of death confounded by genetics (academic.oup.com)

Self-Referential Abstractions (lcamtuf.substack.com)

Microsoft's Recall feature is still threat to privacy despite recent tweaks (adguard.com)

Enable Dark Theme in LibreWolf (bitwilli.com)

Hacker News is a stronghold of transhumanist propaganda and fascist censorship

Up and running–first room-temperature quantum accelerator of its kind in Europe (nachrichten.idw-online.de)

Scientists Create an Artificial Eye That Could Give AI Human-Like Sight (studyfinds.org)

Low-Level Optimization with Zig (alloc.dev)

HZ-program (Typesetting algorithm by Hermann Zapf) (en.wikipedia.org)

Windows 10 spies on your use of System Settings (2021) (michaelhorowitz.com)

Weird mouse-gesture remote configuration file? (2024) (forum.netgate.com)

Physicists observe a new form of magnetism (news.mit.edu)

The Plastic Surgery Procedure Booming Among Washington Men (politico.com)

Forever promises to preserve photos, files 100 years beyond death (web.archive.org)

First usermode exploit for Nintendo Switch 2 (wololo.net)

Natalie Haynes's guide to TV detectives: #9 – Sherlock Holmes (2012) (theguardian.com)

Working with the EPA to Secure Exposed Water HMIs (censys.com)

Building a Slow Web (goodinternetmagazine.com)

Mixtela Precision Clock MkIV (mitxela.com)

Waymos are getting assertive: driverless taxis are learning to drive like humans (sfchronicle.com)

Faster Dashboards with Multi-Column Approximate Sorting (duckdb.org)

Russia steers Shahed drones in Ukraine via Telegram messenger (bulgarianmilitary.com)

Baumol Effect (en.wikipedia.org)

From Boolean logic to bitmath and SIMD: transitive closure of tiny graphs (bitmath.blogspot.com)

A real-world AI alignment failure (pastebin.com)

Why Philosophy of Physics? (aeon.co)

An ancient river landscape preserved beneath the East Antarctic Ice Sheet (2023) (nature.com)

Farewell, NOAA-18 (cimss.ssec.wisc.edu)

Goa Gajah (en.wikipedia.org)

Do Programming Language Features Deliver on Their Promises [video] (youtube.com)

Ultrahuman Home (ultrahuman.com)

Keeping the Web Up Under the Weight of AI Crawlers (eff.org)

AImmerse Web App

Is Jordan Peterson Just Making It Up as He Goes? (thewalrus.ca)

The FAIR Package Manager: Decentralized WordPress infrastructure (joost.blog)

Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Tasks

Comments (15)