Ask HN: How do companies like OpenAI, Perplexity fine tune rich output?

4 agaase19 2 7/4/2025, 7:45:01 AM
I see fine tune as one of the major ways companies like OpenAI, Perplexity, Claude companies differ when it comes to provide higher quality of answers (correct me if I am wrong).

One curious question is how do they fine tune rich data (markdown, html outputs, tables, graphs etc) at scale. Currently, performing fine tuning involves the laborious process of carefully editing inputs (prompts) and outputs one by one. Becomes more difficult as the data context increases and one has to carefully examine the input data and provide the right output including things like formatting, grammar, UI etc.

Considering such a wide variety of questions they are processing, it amazes me how are they doing it at scale. Any thoughts?

Comments (2)

pizza · 11h ago
Anything with a linter means, at minimum, free verifiable rewards for RL (though whether something parses versus looks good is another story). That, plus, they have more data than anyone, and also it seems somewhat reasonable that stronger models could learn 'more' from a given instance or set of examples.
agaase19 · 9h ago
Can you elaborate on "linter means and verifiable rewards for RL"? Is this something others would find extremely difficult to do ?