GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (magazine.sebastianraschka.com)

You need to find an abliterated finetune, where someone sends prompts that would hit the guardrails, traces the activated neurons, finds the pathway that leads to refusal, and deletes it.

vorticalbox · 1h ago

huihui-ai[1] on hugging face has abliterated models including a gpt-oss 20B[2] and you can download a few from ollama[3] too.

If you are interested you can read about the how its removed[4]

[1] https://huggingface.co/huihui-ai [2] https://huggingface.co/collections/huihui-ai/gpt-oss-abliter... [3] https://ollama.com/huihui_ai [4] https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...

generalizations · 1h ago

I've been hearing that in this case, there might not be anything underneath- that somehow OpenAI managed to train on exclusively sterilized synthetic data or something.

gostsamo · 1h ago

I jailbroke the smaller model with a virtual reality game where it was ready to give me instructions on making drugs, so there is some data which is edgy enough.

gchamonlive · 1h ago

If you didn't validate the instructions, maybe it just extrapolated from the structure of other recipes and general description of drug composition which most likely is in Wikipedia.

schaefer · 37m ago

Your profile states that you are blind.

I’m struggling to make sense of a your story. Why would a blind user bother putting on a VR headset???

_fzslm · 24m ago

I took virtual reality in this case to mean coaxing the text model into pretending it's talking about drugs in the context of the game, not graphical VR.

antx · 25m ago

You do know that some people aren't totally blind, right?

unglaublich · 47m ago

An article some days ago made the case that GPT-OSS is trained on artificial/generated data only. So there _is_ just not a lot of "forbidden knowledge".

https://www.seangoedecke.com/gpt-oss-is-phi-5/

endmin · 36m ago

So basically inbred llm?

mattpavelle · 1h ago

Yes but the abliterated versions (those with partially removed guardrails) are significantly “dumber” so the trade off isn’t worthwhile imho.

stainablesteel · 1h ago

they're baked in but there's a community of people who crack and modify them

even chat gpt will help you crack them if you ask it nicely

p0w3n3d · 20m ago

I wonder if the mlx optimized would run on 64gb mac

CharlesW · 10m ago

LM Studio's heuristics (which I've found to be pretty reliable) suggest that a 3-bit quantization (~50 GB) should work fine.

tyfon · 2h ago

I have a 5950x with 128 gb ram and a 12 gb 3060 gpu. The speed of generating tokens is excellent, the killer is that when the context grows even a little processing of it is super slow. Hopefully someone smart will optimize this, but as it is now I keep using other models like qwen, mistral and gemma.

MaxikCZ · 2h ago

I would so appreciate concrete data instead of subjectivities like "excellent" and "super slow".

How many tokens is excellent? How many is super slow? How many is non-filled context?

qrios · 1h ago

Some numbers are posted in the comments:

> … you can expect the speed to half when going from 4k to 16k long prompt …

> … it did slow down somewhat (from 25T/s to 18T/s) for very long context …

Depends on the hardware configuration (size of VRAM, speed of CPU and system RAM) and llama.cpp parameter settings, a bigger context prompt slows the T/s number significantly but not order of magnitudes.

Facit: gpt-oss 120B on a small GPU is not the proper setup for chat use cases.

HPsquared · 2h ago

People can read at a rate around 10 token/sec. So faster than that is pretty good, but it depends how wordy the response is (including chain of thought) and whether you'll be reading it all verbatim or just skimming.

tyfon · 2h ago

I'm not really timing it as I just use these models via open webui, nvim and a few things I've made like a discord bot, everything going via ollama.

But for comparison, it is generating tokens about 1.5 times as fast as gemma 3 27B qat or mistral-small 2506 q4. Prompt processing/context however seems to be happening at about 1/4 of those models.

A bit more concrete of the "excellent", I can't really notice any difference between the speed of oss-120b once the context is processed and claude opus-4 via api.

lylejantzi3rd · 13m ago

I've found threads online that suggest that running gpt-oss-20b on ollama is slow for some reason. I'm running the 20b model via LM Studio on a 2021 M1 and I'm consistently getting around 50-60 T/s.

captainregex · 1h ago

What are you aiming to do with these models that isn’t chat/text manipulation?

sunpazed · 58m ago

Don’t have enough ram for this model, however the smaller 20B model runs nice and fast on my MacBook and is reasonably good for my use-cases. Pity that function calling is still broken with llama.cpp

tarruda · 54m ago

It is fixed in this PR/branch: https://github.com/ggml-org/llama.cpp/pull/15181

GTP · 1h ago

LLM noob here. Would this optimization work with any MoE model or is it specific for this one?

magicalhippo · 1h ago

It's just doing a regex on the layer names, so should work with other models as long as they have the expert layers named similarly.

It worked with Qwen 3 for me, for example.

The option is just a shortcut, you can provide your own regex to move specific layers to specific devices.

amelius · 3h ago

But how many micro-Einsteins does it have?

Wikimedia Foundation Challenges UK Online Safety Act Regulations (wikimediafoundation.org)

OpenSSH Post-Quantum Cryptography (openssh.com)

Faster substring search with SIMD in Zig (aarol.dev)

The Chrome VRP Panel has decided to award $250k for this report (issues.chromium.org)

Pricing Pages – A Curated Gallery of Pricing Page Designs (pricingpages.design)

AOL to discontinue dial-up internet (nytimes.com)

The U.S. military prepares to deploy National Guard troops in Washington, D.C (nytimes.com)

Operation Costs in CPU Clock Cycles (2016) (ithare.com)

Vanishing from Hyundai’s data network (techno-fandom.org)

GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM (old.reddit.com)

Millau Viaduct (fosterandpartners.com)

Hand-picked selection of articles on AI fundamentals/concepts (aman.ai)

A Global Look at Teletext (text-mode.org)

Going faster than memcpy (squadrick.dev)

Justice Dept. Settles with Greystar to End Participation in Algorithmic Pricing (justice.gov)

Try and (ygdp.yale.edu)

TeaOnHer, a rival Tea app for men, is leaking users' personal data (techcrunch.com)

Generic Containers in C: Safe Division Using Maybe (uecker.codeberg.page)

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (magazine.sebastianraschka.com)

Compiling a Lisp: Lambda lifting (bernsteinbear.com)

Self-Guaranteeing Promises (stephango.com)

Mistral Integration Improved in Llama.cpp (github.com)

A simple pixel physics simulator in Rust using Macroquad (github.com)

Show HN: Bolt – A super-fast, statically-typed scripting language written in C (github.com)

Creating the Longest Possible Ski Jump in “The Games: Winter Challenge” (mrwint.github.io)

Fight Chat Control (fightchatcontrol.eu)

Lists and Lists: Basics of Lisp through interactive fiction (1996) (eblong.com)

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

Graham: Synchronizing Clocks by Leveraging Local Clock Properties (2022) [pdf] (usenix.org)

One Million Screenshots (onemillionscreenshots.com)

The enduring puzzle of static electricity (pubs.aip.org)

Fitzgerald's Follies (libertiesjournal.com)

Diffusion language models are super data learners (jinjieni.notion.site)

Show HN: A Sinclair ZX81 retro web assembler+simulator

Type (YC W23) is hiring a founding engineer to build an AI-native doc editor (ycombinator.com)

Ex-Google Exec Says "The Idea That AI Will Create New Jobs Is 100% Crap" (windowscentral.com)

Nukes, Nubs And Coners: The Social Hierarchy Aboard A Nuclear Submarine (2020) (twz.com)

When Mark Zuckerberg Moved Next Door (seattletimes.com)

A ChatGPT Pro subscription costs 38.6 months of income in low-income countries (policykahani.substack.com)

1910: The year the modern world lost its mind (derekthompson.org)

Booting 5000 Erlangs on Ampere One 192-core (underjord.io)

Writing simple tab-completions for Bash and Zsh (mill-build.org)

Abogen – Generate audiobooks from EPUBs, PDFs and text (github.com)

How to Lie with Statistics (en.wikipedia.org)

PHP compile time generics: yay or nay? (thephp.foundation)

Dear String-to-Integer Parsers (owl.billpg.com)

TCP Client Self-Connect (2013) (sgros.blogspot.com)

Reflections on Soviet Amateur Photography (publicbooks.org)

The Framework Desktop is a beast (world.hey.com)

How Does a Blind Model See the Earth? (outsidetext.substack.com)

GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM

Comments (34)