China based rare-earths strat on chem engineers' 1978 Lockheed Martin & M-D tour (motherfriendly.org)

I experimented with GRPO lately, since I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game, but I wanted a different challenge.

So I opted for teaching a model to create a schedule from a list of events and priorities.

Choosing an original problem forced me to think about the problem setting, generate data, choose the base model, design reward functions, and run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding experience :-)

I learned a lot of things, that I want to share with you.

---

- Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

- Code: https://github.com/anakin87/qwen-scheduler-grpo

- Hugging Face collection (dataset and model): https://huggingface.co/collections/anakin87/qwen-scheduler-g...

---

Some hot takes from my experiment

- GRPO is cool for verifiable tasks, but is more about eliciting desired behaviors from the trained model than teaching completely new stuff to it. https://arxiv.org/abs/2504.13837

- Choosing the right base model (and size) matters.

- "Aha moment" might be over-hyped. https://oatllm.notion.site/oat-zero

- Reward functions design is crucial. If your rewards are not robust, you might experience reward hacking (as it happened to me).

- Unsloth is great for saving GPU, but beware of bugs.

My friend was spending $2k/month on Cursor

Jaguar Land Rover extends production shutdown after cyber-attack (theguardian.com)

Socktainer: Docker-compatible REST API for Apple containerization libraries (github.com)

DuckDuckGo now features a Blocked Sites list

A CEO's Guide to Emacs (web.archive.org)

Don't DDoS Yourself (duct-ui.org)

How People Use ChatGPT (nber.org)

Show HN: Drop-in Redis replacement in Rust with 5M+ GET/s (github.com)

Swift 6.2 Released (swift.org)

Android 14 smartphone offers 6.13-inch E-Ink color display, 5G (cnx-software.com)

TSMC working with Taiwanese beekeepers to produce honey from colocated hives (tomshardware.com)

Show HN: Cvee – Create a job post, get top candidates delivered to your inbox (cvee.cc)

How to Fight Fraudulent Publishing in the Mathematical Sciences (arxiv.org)

"Code Your Own Engine": Gearbox CEO Responds to Borderlands 4 Criticism (80.lv)

US Army adopts VC model to supercharge tech deployment (defensenews.com)

China says TikTok's US app will use Chinese algorithm (ft.com)

'Dot-Com Bubble 2.0' could burst at any time (marxist.com)

Great Quotes from Science Fiction (medium.com)

The Rise of Parasitic AI (lesswrong.com)

How Do LLMs Work? (gilesthomas.com)

Performant Embedded Analytics at Scale (embeddable.com)

Ask HN: How can I test FTS5 engine in SQLite3?

Agentic Design Patterns (docs.google.com)

Viewing infrared imagery for any place on Earth (openstreetmap.org)

China based rare-earths strat on chem engineers' 1978 Lockheed Martin & M-D tour (motherfriendly.org)

Peter Talisman: Lord of the Harvest, a game about collecting corn (petertalisman.quest)

It's going to be a life skill: educators discuss the impact of AI on university (theguardian.com)

Readest Could Be [a] New Favorite eBook Reader App on Linux (news.itsfoss.com)

Overthinking vs. Scenario Planning (msthgn.com)

I Spent One Year Building an Open-Source Project (github.com)

Show HN: Daily, the easiest time tracker for Mac, now has a web API (dailytimetracking.com)

Coffee poll: what do you usually order? I order a latte

Kernel Leaderboard (gpumode.com)

The Death of the Student Essay–and the Future of Cognition (forkingpaths.co)

A New and Dangerous Kind of Fame (theatlantic.com)

Win10 Is Nearing End-of-Life: What Should You Do Next? (jasoneckert.github.io)

Presentation Decks are a lot easier to create with Gamma

Not a keyboard you can handle (ykgoon.com)

TypeScript cells in Observable Notebooks 2.0 (bsky.app)

Guide to Deploying TeslaMate on a cloud server (mytesla.cc)

Three weeks in Bangalore: observations on India's AI startup ecosystem (keldenddorji.medium.com)

Hieroglyphs are easier to read than they look [video] (youtube.com)

I've been using macOS Tahoe 26 since June and here are 8 best things about it (theverge.com)

If all the world were a monorepo (jtibs.substack.com)

Job Aggregator App

Outsourcing of IT and cybersecurity functions risks UK economic security (doublepulsar.com)

New EHT Images Reveal Unexpected Polarization Flips at M87 (eventhorizontelescope.org)

More than 40 popular NPM packages compromised (twitter.com)

UN commission says Israel has committed genocide (bbc.co.uk)

Top UN legal investigators conclude Israel is guilty of genocide in Gaza (middleeasteye.net)

GRPO experiment - I trained a Language Model to schedule events

Comments (1)