EU's new AI code of practice could set regulatory standard for US companies (rhodeislandcurrent.com)

You need to find an abliterated finetune, where someone sends prompts that would hit the guardrails, traces the activated neurons, finds the pathway that leads to refusal, and deletes it.

generalizations · 24m ago

I've been hearing that in this case, there might not be anything underneath- that somehow OpenAI managed to train on exclusively sterilized synthetic data or something.

gostsamo · 18m ago

I jailbroke the smaller model with a virtual reality game where it was ready to give me instructions on making drugs, so there is some data which is edgy enough.

mattpavelle · 5m ago

Yes but the abliterated versions (those with partially removed guardrails) are significantly “dumber” so the trade off isn’t worthwhile imho.

stainablesteel · 43m ago

they're baked in but there's a community of people who crack and modify them

even chat gpt will help you crack them if you ask it nicely

tyfon · 1h ago

I have a 5950x with 128 gb ram and a 12 gb 3060 gpu. The speed of generating tokens is excellent, the killer is that when the context grows even a little processing of it is super slow. Hopefully someone smart will optimize this, but as it is now I keep using other models like qwen, mistral and gemma.

MaxikCZ · 59m ago

I would so appreciate concrete data instead of subjectivities like "excellent" and "super slow".

How many tokens is excellent? How many is super slow? How many is non-filled context?

qrios · 45m ago

Some numbers are posted in the comments:

> … you can expect the speed to half when going from 4k to 16k long prompt …

> … it did slow down somewhat (from 25T/s to 18T/s) for very long context …

Depends on the hardware configuration (size of VRAM, speed of CPU and system RAM) and llama.cpp parameter settings, a bigger context prompt slows the T/s number significantly but not order of magnitudes.

Facit: gpt-oss 120B on a small GPU is not the proper setup for chat use cases.

tyfon · 50m ago

I'm not really timing it as I just use these models via open webui, nvim and a few things I've made like a discord bot, everything going via ollama.

But for comparison, it is generating tokens about 1.5 times as fast as gemma 3 27B qat or mistral-small 2506 q4. Prompt processing/context however seems to be happening at about 1/4 of those models.

A bit more concrete of the "excellent", I can't really notice any difference between the speed of oss-120b once the context is processed and claude opus-4 via api.

HPsquared · 55m ago

People can read at a rate around 10 token/sec. So faster than that is pretty good, but it depends how wordy the response is (including chain of thought) and whether you'll be reading it all verbatim or just skimming.

captainregex · 45m ago

What are you aiming to do with these models that isn’t chat/text manipulation?

GTP · 32m ago

LLM noob here. Would this optimization work with any MoE model or is it specific for this one?

magicalhippo · 27m ago

It's just doing a regex on the layer names, so should work with other models as long as they have the expert layers named similarly.

It worked with Qwen 3 for me, for example.

The option is just a shortcut, you can provide your own regex to move specific layers to specific devices.

nativeit · 30m ago

…and yet a much more capable model (my own brain) still runs better than this on pop tarts.

NitpickLawyer · 7m ago

Give hydrogen a few billion years, and it starts making fun of the inefficiencies in silicon-based siblings.

MaxikCZ · 17m ago

Your comment will get donvoted to invisibility anyways (or mayhaps even flagged), but I have to ask: what are you trying to accomplish with comments such this? Just shitting at it because it isnt as good as youd like yet? You want the best of tomorrow today, and will only be rambling about how its not good enough yesterday?

gjsman-1000 · 15m ago

Well, now I have to ask, what your purpose on calling him out, is. Does it deeply offend you that non-believers exist, who do not believe the technology will improve substantially in usefulness from here?

senko · 33s ago

[delayed]

amelius · 2h ago

But how many micro-Einsteins does it have?

Scorching Cells (reuters.com)

Show HN: Ugly Wallpapers (wallpapers.branding5.com)

How Well Do LLMs Perform on a Raspberry Pi 5? (stratosphereips.org)

Show HN: Open-source protocol for secure tool-calling [Technical Specification] (utcp.io)

An efficient and lightweight local debugging tool (github.com)

Operation Costs in CPU Clock Cycles (2016) (ithare.com)

Justice Dept. Settles with Greystar to End Participation in Algorithmic Pricing (justice.gov)

EU's new AI code of practice could set regulatory standard for US companies (rhodeislandcurrent.com)

Show HN: Free and open source web site and server monitoring tool (github.com)

Zetrix Introduces DeepSeek-Based NurAI Shariah-Compliant AI Chatbot in Malaysia (technave.com)

Intel CPU Temperature Monitoring Driver for Linux Now Unmaintained After Layoffs (phoronix.com)

Grok Reffers to Black Person as Chimpanzee (twitter.com)

Amazon Drone Beehive Concept (2019) (etrr.springeropen.com)

Wind and solar power helps keep America's farms alive (theconversation.com)

Wikimedia Foundation Challenges UK Online Safety Act Regulations (wikimediafoundation.org)

Open Databases Integration for Materials Design (optimade.org)

The Mother of All Demos (en.wikipedia.org)

Anyone here using BICC extracts or BI Publisher?

We Fixed AI's Broken Promise (understoryai.substack.com)

The iHost – Thoughts about the future of Self-Hosting (kiranet.org)

Stewart Brand (en.wikipedia.org)

Pricing Pages – A Curated Gallery of Pricing Page Designs (pricingpages.design)

Agentic Coding a FastHTML RAG Eval App (elite-ai-assisted-coding.dev)

Show HN: AskPrisma – Multi-agent AI that can replace a junior data analyst (askprisma.ai)

Show HN: Airbook – Cursor for Analytics

The Reason Your Company's Growth Is Stalled (businessofsoftware.org)

Las Vegas sees drop in tourism, hinting at broader economic woes (npr.org)

Measuring context switching and memory overheads for Linux threads (2018) (eli.thegreenplace.net)

Cybertruck deactivated on road after a cease and desist for using it in a song (old.reddit.com)

Lucy Could Visit an Additional Sub-Km Asteroid with a Course Correction (universetoday.com)

Ask HN: Should a no-code AI app builder be open source? (magicnode.ai)

Fix your port numbers for dev servers (rollc.at)

Task Files (quexxon.net)

A Berkeley Odyssey: Ten years of BSD history (1985) (gitpi.us)

Study on Mice Suggests Nose-Picking Has a Surprising Link with Alzheimer's (sciencealert.com)

Nukes, Nubs and Coners: The Unique Social Hierarchy Aboard a Nuclear Submarine (twz.com)

Avi Loeb Figured Out Why Spacecraft Have Comae (sites.psu.edu)

Israel's Leviathan signs $35B natural gas supply deal with Egypt (reuters.com)

500k H1B Workers Approved Every Year (twitter.com)

The History of Acer (abortretry.fail)

Forget Online Courses. Learn from Your Neighbors (mavenly.org)

Show HN: 1 Million Rows (1mrows.pages.dev)

US constitution Article S9.C5 on export taxes (law.cornell.edu)

Documentation that is never wrong (kodare.net)

Experiments in 3D Printing Electric Motors (mdpi.com)

Reflections on the React Community (leerob.substack.com)

Why Is Web Performance Undervalued? (blaines-blog.com)

Grug: The Perfect Modding Language (mynameistrez.github.io)

Ask HN: How do you automatically detect if your signup/login is down?

Show HN: An open-source email archiver with full-text search capabilities (openarchiver.com)

GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM

Comments (19)