Ask HN: Is synthetic data generation practical outside academia?
4 points by cpard 9h ago 2 comments
Ask HN: Has anybody built search on top of Anna's Archive?
284 points by neonate 3d ago 146 comments
Reproducing the deep double descent paper
15 stpn 4 6/5/2025, 6:34:23 PM stpn.bearblog.dev ↗
I was curious about this since it kind of makes sense, but I offer a few reasons why I think this isn't the case:
- In the 10% noise case at least, the second descent eventually finds a minima that's better than the original local minima which suggests to me the model really is finding a better fit rather than just reducing itself to a similar smaller model
- If it were the case, I think we might also expect the error for larger models to converge to the performance of smaller models? But instead they converge lower and better
- I checked the logged gradient histograms I had for a the runs. While I'm still learning how to interpret the results, I didn't see signs of vanishing gradients where dead neurons later in the model prevented earlier layers from learning. Gradients do get smaller over time but that seems expected and we don't have big waves of neurons dying which is what I'd expect to have the larger network converge on the size of the smaller one.