Show HN: TextPolicy – reinforcement learning for text generation on a MacBook
4 teilom 0 8/30/2025, 4:34:08 PM github.com ↗
I built TextPolicy because I wanted a way to study reinforcement learning for text generation without needing a cluster or cloud GPUs. A MacBook is enough.
The toolkit is simple:
Implements GRPO and GSPO algorithms
Provides a decorator interface for custom reward functions
Includes LoRA and QLoRA utilities
Runs on MLX, so it is efficient on Apple Silicon
It is not intended for production. The purpose is learning and experimentation: to understand algorithms, to test ideas, to see how reward shaping affects behavior.
Installation is through pip:
pip install textpolicy
There is a minimal example in the README.
I am interested in feedback on:
the clarity of the API,
the usefulness of the examples,
and whether this lowers the barrier for people new to RL.
Repository: github.com/teilomillet/textpolicy
No comments yet