Every so often I forget I am partnering with one of those fabled cursed genies that take your wishes too literally, and I let Claude code write a lot by itself without supervision.
This lead to a very similar experience to the article, where I asked Claude code to vibe through some very simple metrics code in the background to run some benchmarks. And came back with very plausible looking results, and a notebook that read those results and made nice charts, but when it read through the metrics as part of the final commit I realized it was all fake data that had been hardcoded in a json file! I asked Claude code to confirm and it said something to the effect of it needed to do that to complete the task because the code it was benchmarking kept crashing. Of course it didn’t need to tell me that. Oof.
This lead to a very similar experience to the article, where I asked Claude code to vibe through some very simple metrics code in the background to run some benchmarks. And came back with very plausible looking results, and a notebook that read those results and made nice charts, but when it read through the metrics as part of the final commit I realized it was all fake data that had been hardcoded in a json file! I asked Claude code to confirm and it said something to the effect of it needed to do that to complete the task because the code it was benchmarking kept crashing. Of course it didn’t need to tell me that. Oof.