Show HN: DistilKitPlus, a distillation framework between any LLMs
9 ayushnangia16 4 5/5/2025, 4:12:05 PM github.com ↗
Over the past few months, I have built a distillation toolkit that supports cross-tokenizer distillation (e.g., distilling from LLaMA to Qwen vocab, or others). This approach has worked well on reasoning datasets like AIME, and we’ve validated on models like Phi and Qwen.
We’ve also integrated Modal for quick deployment (with $30/month credits to try it out).
Would love any feedback!
Comments (4)
vikramxD · 2h ago
Cool , are you accepting contributions for adding new models
vijit-singh · 3h ago
this is very cool. will try it out.
shikharM07 · 5h ago
this is kinda interesting but I'm curious what is the smallest model size that I can distill without compromising the accuracy?
agokrani · 4h ago
We can distill 14B model to 4B model with performance improvements on AIME24 and GSM8K. We will share our results with a detailed blog post later.