When I went to work at Google in 2008 I immediately advocated for spending significant resources on the biological sciences (this was well before DM started working on biology). I reasoned that Google had the data mangling and ML capabilities required to demonstrate world-leading results (and hopefully guide the way so other biologists could reproduce their techniques). We made some progress- we used exacycle to demonstrate some exciting results in protein folding and design, and later launched Cloud Genomics to store and process large datasets for analytics.
I parted ways with Google a while ago (sundar is a really uninspiring leader), and was never able to transfer into DeepMind, but I have to say that they are executing on my goals far better than I ever could have. It's nice to see ideas that I had germinating for decades finally playing out, and I hope these advances lead to great discoveries in biology.
It will take some time for the community to absorb this most recent work. I skimmed the paper and it's a monster, there's just so much going on.
bitpush · 20m ago
> It's nice to see ideas that I had germinating for decades finally playing out
I'm sure you're a smart person, and probably had super novel ideas but your reply comes across as super arrogant / pretentious. Most of us have ideas, even impressive ones (here's an example - lets use LLMs to solve world hunger & poverty, and loneliness & fix capitalism), but it'd be odd to go and say "Finally! My ideas are finally getting the attention".
CGMthrowaway · 11m ago
Yeah it comes off as braggy, but it’s only natural to be proud of your foresight
sampl3username · 11m ago
This reads like an LLM generated text.
shadowgovt · 6m ago
FWIW, I interpreted more as "This is something I wanted to see happen, and I'm glad to see it happening even if I'm not involved in it."
nextos · 1h ago
I found it disappointing that they ignored one of the biggest problems in the field, i.e. distinguishing between causal and non-causal variants among highly correlated DNA loci. In genetics jargon, this is called fine mapping. Perhaps, this is something for the next version, but it is really important to design effective drugs that target key regulatory regions.
One interesting example of such a problem and why it is important to solve it was recently published in Nature and has led to interesting drug candidates for modulating macrophage function in autoimmunity: https://www.nature.com/articles/s41586-024-07501-1
rattlesnakedave · 50m ago
Does this get us closer? Pretty uninformed but seems that better functional predictions make it easier to pick out which variants actually matter versus the ones just along for the ride. Step 2 probably is integrating this with proper statistical fine mapping methods?
nextos · 25m ago
Yes, but it's not dramatically different from what is out there already.
There is a concerning gap between prediction and causality. In problems, like this one, where lots of variables are highly correlated, prediction methods that only have an implicit notion of causality don't perform well.
Right now, SOTA seems to use huge population data to infer causality within each linkage block of interest in the genome. These types of methods are quite close to Pearl's notion of causal graphs.
ejstronge · 22m ago
> SOTA seems to use huge population data to infer causality within each linkage block of interest in the genome.
This has existed for at least a decade, maybe two.
> There is a concerning gap between prediction and causality.
Which can be bridged with protein prediction (alphafold) and non-coding regulatory predictions (alphagenome) amongst all the other tools that exist.
What is it that does not exist that you "found it disappointing that they ignored"?
nextos · 13m ago
> This has existed for at least a decade, maybe two.
Methods have evolved a lot in a decade.
Note how AlphaGenome prediction at 1 bp resolution for CAGE is poor. Just Pearson r = 0.49. CAGE is very often used to pinpoint causal regulatory variants.
Scaevolus · 1h ago
Naturally, the (AI-generated?) hero image doesn't properly render the major and minor grooves. :-)
I parted ways with Google a while ago (sundar is a really uninspiring leader), and was never able to transfer into DeepMind, but I have to say that they are executing on my goals far better than I ever could have. It's nice to see ideas that I had germinating for decades finally playing out, and I hope these advances lead to great discoveries in biology.
It will take some time for the community to absorb this most recent work. I skimmed the paper and it's a monster, there's just so much going on.
I'm sure you're a smart person, and probably had super novel ideas but your reply comes across as super arrogant / pretentious. Most of us have ideas, even impressive ones (here's an example - lets use LLMs to solve world hunger & poverty, and loneliness & fix capitalism), but it'd be odd to go and say "Finally! My ideas are finally getting the attention".
One interesting example of such a problem and why it is important to solve it was recently published in Nature and has led to interesting drug candidates for modulating macrophage function in autoimmunity: https://www.nature.com/articles/s41586-024-07501-1
There is a concerning gap between prediction and causality. In problems, like this one, where lots of variables are highly correlated, prediction methods that only have an implicit notion of causality don't perform well.
Right now, SOTA seems to use huge population data to infer causality within each linkage block of interest in the genome. These types of methods are quite close to Pearl's notion of causal graphs.
This has existed for at least a decade, maybe two.
> There is a concerning gap between prediction and causality.
Which can be bridged with protein prediction (alphafold) and non-coding regulatory predictions (alphagenome) amongst all the other tools that exist.
What is it that does not exist that you "found it disappointing that they ignored"?
Methods have evolved a lot in a decade.
Note how AlphaGenome prediction at 1 bp resolution for CAGE is poor. Just Pearson r = 0.49. CAGE is very often used to pinpoint causal regulatory variants.