The ML research team at Voxel51 just released a paper showing that foundation models rival the accuracy of human annotators in labeling large visual datasets, at several orders of magnitude less time and cost.
We also found that models trained from these labels perform about as well as those trained from human labels when tested against public validation sets. Interestingly, setting a relatively low confidence threshold (0.2 - 0.5) for the auto-generated labels maximized downstream model performance. Very high confidence thresholds often produced worse results due to reduced recall.
The upshot is that zero-shot labeling can replace human annotation in many datasets. The massive cost savings can then be redirected toward training higher-parameter models.
Happy to answer any questions about the research. You can also read this blog we wrote that goes more in depth into the methods and tools we used.
https://link.voxel51.com/HN-VAL-blog/
We also found that models trained from these labels perform about as well as those trained from human labels when tested against public validation sets. Interestingly, setting a relatively low confidence threshold (0.2 - 0.5) for the auto-generated labels maximized downstream model performance. Very high confidence thresholds often produced worse results due to reduced recall.
The upshot is that zero-shot labeling can replace human annotation in many datasets. The massive cost savings can then be redirected toward training higher-parameter models.
Happy to answer any questions about the research. You can also read this blog we wrote that goes more in depth into the methods and tools we used. https://link.voxel51.com/HN-VAL-blog/