Group Sequence Policy Optimization

Comments (1)

kdavis · 7h ago

This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level importance ratios, GSPO defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. We demonstrate that GSPO achieves superior training efficiency and performance compared to the GRPO algorithm, notably stabilizes Mixture-of-Experts (MoE) RL training, and has the potential for simplifying the design of RL infrastructure. These merits of GSPO have contributed to the remarkable improvements in the latest Qwen3 models.

GPT-5 (openai.com)

Fight Chat Control (fightchatcontrol.eu)

GitHub is no longer independent at Microsoft after CEO resignation (theverge.com)

I tried every todo app and ended up with a .txt file (al3rez.com)

Claude Sonnet 4 now supports 1M tokens of context (anthropic.com)

Ultrathin business card runs a fluid simulation (github.com)

I want everything local – Building my offline AI workspace (instavm.io)

Streaming services are driving viewers back to piracy (theguardian.com)

Wikipedia loses challenge against Online Safety Act (bbc.com)

Anna's Archive: An Update from the Team (annas-archive.org)

FFmpeg 8.0 adds Whisper support (code.ffmpeg.org)

Emailing a one-time code is worse than passwords (blog.danielh.cc)

Debian 13 “Trixie” (debian.org)

Steve Wozniak: Life to me was never about accomplishment, but about happiness (yro.slashdot.org)

Good system design (seangoedecke.com)

Vibechart (vibechart.net)

Why LLMs can't really build software (zed.dev)

VC-backed company just killed my EU trademark for a small OSS project

Claude Code is all you need (dwyer.co.za)

Gemma 3 270M: Compact model for hyper-efficient AI (developers.googleblog.com)

Nginx introduces native support for ACME protocol (blog.nginx.org)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

Claude says “You're absolutely right!” about everything (github.com)

AGENTS.md – Open format for guiding coding agents (agents.md)

PYX: The next step in Python packaging (astral.sh)

Copilot broke audit logs, but Microsoft won't tell customers (pistachioapp.com)

Open hardware desktop 3D printing is dead? (josefprusa.com)

How I code with AI on a budget/free (wuu73.org)

Show HN: Building a web search engine from scratch with 3B neural embeddings (blog.wilsonl.in)

Obsidian Bases (help.obsidian.md)

Show HN: I built an app to block Shorts and Reels (scrollguard.app)

How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos (research.kudelskisecurity.com)

Try and (ygdp.yale.edu)

This website is for humans (localghost.dev)

I accidentally became PureGym’s unofficial Apple Wallet developer (drobinin.com)

GPT-5: Key characteristics, pricing and system card (simonwillison.net)

Wikimedia Foundation Challenges UK Online Safety Act Regulations (wikimediafoundation.org)

Web apps in a single, portable, self-updating, vanilla HTML file (hyperclay.com)

OpenFreeMap survived 100k requests per second (blog.hyperknot.com)

Jim Lovell, Apollo 13 commander, has died (nasa.gov)

Search all text in New York City (alltext.nyc)

Lazy-brush – smooth drawing with mouse or finger (lazybrush.dulnan.net)

OpenMower – An open source lawn mower (github.com)

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

What's the strongest AI model you can train on a laptop in five minutes? (seangoedecke.com)

Show HN: Whispering – Open-source, local-first dictation you can trust (github.com)

The future of large files in Git is Git (tylercipriani.com)

Historical Tech Tree (historicaltechtree.com)

Why are there so many rationalist cults? (asteriskmag.com)

Meta Leaks Part 1: Israel and Meta (archive.org)

Group Sequence Policy Optimization

Comments (1)