Show HN: I replicated GRPO and made it one-click runnable on HPC-AI.com

Hi HN,

I’m excited to share RUNRL JOB, our new one-click service for running Reinforcement Learning Fine-Tuning (RFT) workloads—think GRPO, PPO, or any custom reward-based tuning—directly on HPC-AI.com.

What It Is Pre-wired RFT pipeline: dual-network configs, memory optimizations, logging, and reward modules are all set up for you.

Model support: demos with Qwen-3B and Qwen-1.5 out of the box; drop in your own model if you like.

Cost & performance transparency: real-hardware benchmarks on 8× H100/H200, with live metrics in TensorBoard and built-in cost tracking.

Why It Matters Memory-efficient GRPO: up to 40% memory savings vs PPO—no separate value network or double backward pass.

Zero setup: no Dockerfiles, no dependency hell—just click “Start” and your training job spins up.

Accessible RLHF: lowers the barrier for researchers, students, and indie hackers to experiment at scale.

How to Try Visit the blog post: https://hpc-ai.com/blog/RUNRL_JOB_is_live_on_hpc-ai

Click “Launch GPU Instances”, choose H100 or H200.

Select the RUNRL JOB template and hit “Start Job”.

Monitor progress live in JupyterLab or via TensorBoard—zero extra setup.

Show HN: I replicated GRPO and made it one-click runnable on HPC-AI.com

Comments (6)