Show HN: TextPolicy – reinforcement learning for text generation on a MacBook

4 teilom 0 8/30/2025, 4:34:08 PM github.com ↗
I built TextPolicy because I wanted a way to study reinforcement learning for text generation without needing a cluster or cloud GPUs. A MacBook is enough. The toolkit is simple: Implements GRPO and GSPO algorithms Provides a decorator interface for custom reward functions Includes LoRA and QLoRA utilities Runs on MLX, so it is efficient on Apple Silicon It is not intended for production. The purpose is learning and experimentation: to understand algorithms, to test ideas, to see how reward shaping affects behavior. Installation is through pip: pip install textpolicy There is a minimal example in the README. I am interested in feedback on: the clarity of the API, the usefulness of the examples, and whether this lowers the barrier for people new to RL. Repository: github.com/teilomillet/textpolicy

Comments (0)

No comments yet