Opportunities and Challenges of Applying Generative AI to Designing New Drugs
Generative AI is beginning to transform everything from content creation to workflows across industries, but can it create new drugs? Drug design is both a natural fit and a unique challenge for generative AI, given large but messy datasets, time and cost of real-world testing, and the complexity of human trials.
We at Menlo Ventures recently sat down with three founders, Namrata Anand (Founder and CEO of Diffuse Bio), Mark DePristo (Co-Founder and CEO of BigHat Biosciences), and Lucas Nivon (Co-Founder and CEO of Cyrus Biotech), to discuss applying the latest AI technologies to drug design. We wanted to discuss the historical context for these new technologies, try to separate hype from reality, and get their perspective on what comes next at the intersection of AI and biology. Here are five key takeaways from that discussion.
1. Savvy data scientists have been using generative AI and transformer models on biological datasets for years
The burst of interest in recent news cycles might make it seem like generative AI models were born overnight, but researchers in both academia and industry have been building and applying these tools for years. As Mark pointed out, the original transformer paper, “Attention Is All You Need,” came from Google in 2017. Multiple biotechs with advanced AI capabilities, including Menlo portfolio companies like Genesis Therapeutics and Recursion, and others like Schrodinger and Insilico, already have some generative AI models embedded in their drug development process, starting with small molecules and extending into biologics.
2. But we are now seeing a quantum leap forward in results, including generating increasingly more complex molecules
Just as in other fields, generative AI models in drug design have, in recent months, shown dramatically improved performance, including success in more complex molecules like proteins and antibodies. When Namrata started working on protein design seven years ago at Stanford, few people foresaw the application of diffusion models to design complex proteins de novo. As with many AI applications, though, once they’re shown to be possible in publications and open-sourced toolkits, large scale LLMs and diffusion models are rapidly being adopted in academic labs and ultimately in industry.
Lucas pointed out that AI has been applied to protein prediction and design since he was a postdoc in David Baker’s lab in 2008. Early AI models were often less accurate than statistical and physics-based modeling. However, the integration of statistical and physical models into new AI tools such as AlphaFold, RoseTTAFold, OpenFold, and RFDiffusion have massively increased accuracy and enabled iterative testing in silico to further improve performance.
3. Quantity of training data matters, but data quality and completeness matters even more.
Will accurate training of generative models for drug design require ultra-large generalized data sets? Mark agreed that large datasets are important for training, but he drew an analogy to self-driving cars. One day of driving can provide training data for about 80% of the different scenarios a car will encounter. However, to safely deploy self-driving cars, you need to collect high-quality data on rare but critical “edge case” events, which requires much more time and cost. For biotechs who are developing specific types of drugs, training their foundational models specifically for the class of molecule they are trying to develop will be critical to the final product. Although there are massive public databases, public data may not be clean enough to tune the best models. At BigHat, the focus has been on pairing AI-driven design with automated synthesis and testing infrastructure to test and iterate every week.
4. Know the limits of generative AI and how to use it best
Lucas notes that each drug must be optimized across many parameters at once—for example, efficacy vs. toxicity—in order to be successful in human trials. Current AI models can optimize individual parameters (like binding affinity) but not everything required to make a drug at the same time. Our industry will need a variety of models for different parameters (such as pharmacokinetics, oral bioavailability, and cell type compatibility) to ultimately integrate and improve performance.
Namrata extended the analogy of autonomous driving to highlight the value of “human in the loop.” It may take longer than expected for fully autonomous driving in any conditions (level 5), but partial automation or automation of specific use cases may pave the way. Developing a drug without human input or experimental validation shouldn’t be the ultimate goal, but using technology to help humans make better drugs for unmet clinical needs faster should be more practical and productive in the near term.
5. Integrating tech and bio is critical for success
All three founders emphasized their use of the design-build-test cycle to rapidly iterate and feed data back into their model. Successful iteration requires tight coordination between traditionally siloed computational and biology/chemistry teams. Mark emphasized that while LLM models are rapidly disseminating, integrating the outputs with wet lab and real-world testing is challenging. Lucas added that AI cannot be successfully bolted on, but requires integration at all levels to enable high-velocity iteration, which is critical to improving performance.
Thank you to Namrata, Mark, and Lucas for sharing insights and experiences from building AI-enabled drug development platforms. We believe that generative AI, together with in silico prediction and experimental validation, will accelerate development of novel medicines in areas of tremendous unmet need. As an early investor in companies operating at the intersection of tech and bio, we at Menlo are excited to support existing pioneers in AI-enabled drug development, like Recursion and Genesis Therapeutics, and to continue to support the next generation of founders building new platforms for breakthrough medicines.