Show HN: I replicated GRPO and made it one-click runnable on HPC-AI.com
I’m excited to share RUNRL JOB, our new one-click service for running Reinforcement Learning Fine-Tuning (RFT) workloads—think GRPO, PPO, or any custom reward-based tuning—directly on HPC-AI.com.
What It Is Pre-wired RFT pipeline: dual-network configs, memory optimizations, logging, and reward modules are all set up for you.
Model support: demos with Qwen-3B and Qwen-1.5 out of the box; drop in your own model if you like.
Cost & performance transparency: real-hardware benchmarks on 8× H100/H200, with live metrics in TensorBoard and built-in cost tracking.
Why It Matters Memory-efficient GRPO: up to 40% memory savings vs PPO—no separate value network or double backward pass.
Zero setup: no Dockerfiles, no dependency hell—just click “Start” and your training job spins up.
Accessible RLHF: lowers the barrier for researchers, students, and indie hackers to experiment at scale.
How to Try Visit the blog post: https://hpc-ai.com/blog/RUNRL_JOB_is_live_on_hpc-ai
Click “Launch GPU Instances”, choose H100 or H200.
Select the RUNRL JOB template and hit “Start Job”.
Monitor progress live in JupyterLab or via TensorBoard—zero extra setup.