Who will make AlphaFold3 open source? Scientists race to crack AI model

Who will make AlphaFold3 open source? Scientists race to crack AI model

Researchers are aiming to create fully accessible versions of the latest iteration of DeepMind's blockbuster protein-structure model.
  1. Ewen Callaway View author publications

    You can also search for this author in PubMed  Google Scholar

An AlphaFold3 model of a kinase bound to chemical.Credit: Isomorphic Labs

When Google DeepMind unveiled AlphaFold31 — the latest edition of its revolutionary protein-structure-prediction AI — in Nature this month, it came with a hitch. Unlike a previous version2, there was no computer code describing the advance to accompany the paper.

Major AlphaFold upgrade offers boost for drug discovery

The London-based company reversed course days later, promising to release the code by year's end. But the omission has set researchers worldwide racing to develop their own open-source versions of AlphaFold3, an artificial-intelligence (AI) model that can predict a protein's structure, as well as those of other molecules, including potential new drugs. Other scientists are doing their best to hack the web version of AlphaFold3 that DeepMind released to skirt its limitations.

"It would be bad if capabilities that are just so fundamental to our ability to do drug discovery and other things that are relevant for human health end up getting locked up," says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City. His 'OpenFold' team has already begun3 coding an open-source version of AlphaFold3 that they hope to complete this year.

Scientists disappointed

DeepMind's initial withholding of code for AlphaFold3, as well as the 9 May publication in Nature, irked many scientists ( Nature's news team is independent of its journal team). Nature's policies say that code associated with studies should typically be made available, while acknowledging that there can be restrictions.

"This does not align with the principles of scientific progress, which rely on the ability of the community to evaluate, use, and build upon existing work," states an 11 May open letter to Nature co-written by Stephanie Wankowicz, a computational structural biologist at the University of California, San Francisco, and nine other scientists, and which has since been signed by more than 600 researchers.

AlphaFold3 — why did Nature publish it without its code?

In an editorial published on 22 May, Nature said that it welcomed the conversation that the AlphaFold3 publication sparked, and seeks opinions from readers on how it can encourage openness in science. It added that its policies support open science, but acknowledges that the private sector funds most global research and that many findings resulting from such work remain proprietary. "We at Nature think it's important that journals engage with the private sector and work with its scientists so they can submit their research for peer review and publication," it stated. The journal said that it would update the paper with the code when DeepMind releases it.

In lieu of code and parameters obtained from training AlphaFold3 called model weights, DeepMind created a website where researchers can access the tool. But this AlphaFold3 server is restricted: it can be used only for non-commercial research, and it is not possible to obtain structures of proteins bound to possible drugs. The paper describing AlphaFold3 also contains detailed 'pseudocode', outlining how the model works.

Retraining AlphaFold

Roland Dunbrack, a computational structural biologist at Fox Chase Cancer Center in Philadelphia, Pennsylvania, who says that he peer reviewed the AlphaFold3 paper for Nature, says that he was disappointed that DeepMind did not release the code, either for him to review or in the publication. The availability of the AlphaFold2 code widened its reach and enabled researchers to adapt and improve on the tool. "I wanted the downloadable code because of the science that would happen if I and others had access," says Dunbrack, a co-author of the open letter.

On 13 May, days after the backlash began, DeepMind did an about-face and announced that it would make the AlphaFold3 code and model weights available for academic use within six months.

But questions remain as to whether this version of AlphaFold3 will have the full range of capabilities, especially the ability to predict the structure of proteins in conjunction with potential drug molecules, or ligands, say scientists. "I don't think they are going to give us the ability to do any ligand," says Dunbrack. The OpenFold3 model that AlQuraishi's team is developing won't have such limitations, he says, nor any restrictions on commercial use.

There are other reasons that scientists are pursuing open-source versions of AlphaFold3. One, says AlQuraishi, will be the ability to retrain the model to better model interactions between proteins and would-be drugs. His team retrained its version of AlphaFold2 using the same publicly available data sets that DeepMind used. But AlQuraishi expects that many pharmaceutical companies — which have access to troves of experimentally determined structures of proteins bound to possible drugs — will be keen to have a version of AlphaFold3 that they can retrain with their own proprietary data, which could boost the model's performance.

What's next for AlphaFold and the AI protein-folding revolution

AlQuraishi isn't the only scientist trying to learn AlphaFold3's secrets. David Baker, a computational biophysicist at the University of Washington in Seattle, wants to see what can be applied to an open-source protein- and chemical-prediction model that his team developed called RoseTTAFold-All-Atom, which doesn't yet perform as well as AlphaFold3.

And Phil Wang, an independent software engineer in San Francisco has begun a crowdsourced effort to replicate DeepMind's latest model. Wang, who also has a medical degree, has developed open-source versions of dozens of AI models, including the image-generating tool DALL-E. Wang has received financial support for his work from companies to do this in the past, but has not yet received offers to work on opening up AlphaFold3.

Hacked versions

Wang says that his team of three expects to have the code describing the AlphaFold3 model done in a month. But the most time-consuming step will be training the models on experimentally determined protein structures and other data sets, says AlQuraishi. "The code is by far the easier piece. That's 5% of the effort."

It's also likely to prove expensive, says Sergey Ovchinnikov, an evolutionary biologist at the Massachusetts Institute of Technology in Cambridge. It could cost upwards of US$1 million in cloud-computing resources to train AlphaFold3 in the same way that DeepMind did, Ovchinnikov estimates, although it might be possible to cut corners to bring costs down without compromising performance.

A fully open-source version of AlphaFold3 will allow researchers to better understand how the model works and expand its abilities. But some scientists are already trying to do this with the AlphaFold3 server. "There has already been some hacking online," says Ovchinnikov, for instance to obtain more-accurate models of proteins embedded in the cell membrane, where they interact with fat molecules. Another server hack has revealed an alternative shape that one protein adopts.

AlQuraishi hopes that the push to develop open-source versions of AlphaFold3 will serve as a "cautionary tale" to academics about the perils of relying on technology companies such as DeepMind to develop and distribute tools such as AlphaFold. "It's good they did it, but we shouldn't depend on it," he says. "We need to create a public-sector infrastructure to be able to do that in academia."

doi: https://doi.org/10.1038/d41586-024-01555-x

References

  1. Abramson, J. et al. Nature https://doi.org/10.1038/s41586-024-07487-w (2024).

    Article  PubMed  Google Scholar 
  2. Jumper, J. et al. Nature 596, 583–589 (2021).

    Article  PubMed  Google Scholar 
  3. Ahdritz, G. et al. Nature Methods https://doi.org/10.1038/s41592-024-02272-z (2024).

    Article  Google Scholar 

Download references

Reprints and permissions

Related Articles

  • AlphaFold3 — why did Nature publish it without its code?
  • Major AlphaFold upgrade offers boost for drug discovery
  • AlphaFold touted as next big thing for drug discovery — but is it?
  • What's next for AlphaFold and the AI protein-folding revolution
  • How AlphaFold can realize AI's full potential in structural biology
  • Foldseek gives AlphaFold protein database a rapid search tool
  • Read the paper: Highly accurate protein structure prediction with AlphaFold

Subjects

Latest on:

Pay researchers to spot errors in published papers

World View 21 MAY 24

Plagiarism in peer-review reports could be the 'tip of the iceberg'

Nature Index 01 MAY 24

Algorithm ranks peer reviewers by reputation — but critics warn of bias

Nature Index 25 APR 24

Jobs

Related Articles

Subjects

Sign up to Nature Briefing

An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday. Email address Yes! Sign me up to receive the daily Nature Briefing email. I agree my information will be processed in accordance with the Nature and Springer Nature Limited Privacy Policy. Sign up