Show HN: Scoring dev work by how hard it is for AI to copy it
Developer work is unusually public. Aside from git diffs, you can see PR comments, Linear threads, and get a sense of both the complexity of the work and how people collaborate.
I tried a little adversarial experiment: - Take recent commits and have an LLM guess the "spec" (simulating a ticket on Linear, without building this step) - Ask Claude Code to implement the same thing - Use another LLM to compare the two solutions blindly - If the LLM version is worse than the human version, keep giving it hints until it matches or exceeds the human contribution - More elaborate hints = higher complexity score - Evaluating comments is even simpler. I didn’t try an adversarial approach, but there’s no reason it wouldn’t work.
This turned into a small library I hacked together. You can score devs on repos for fun.
I wonder if managers use numbers simply because they can’t hold all the context of a person’s contributions, and so lose out on nuance. What if LLMs could hold all of the context of your work and give a fairer evaluation? Could we move away from PMs deciding the “what” and engineers deciding the “how”, to Engineers deciding both?
PRs welcome!
No comments yet