One interesting question about AGI and why it may be a decent way off - I wonder if it would require an evolutionary process. Maybe you could do this with smaller models, but instead of training 1 relatively big model you train a million random models and choose the best of those and use those to train like two million models, with mutations, rinse and repeat a while.
Whatever it is, basically instead of training a handful of models at a time, we should be training millions.
Whatever it is, basically instead of training a handful of models at a time, we should be training millions.