Channels

 

Special Offers & Promotions

 

 

Latest News

 

 

View Channel

New Products

 

 

View Channel

Video Presentations

 

 

View Channel

Separation Science

 

 

View Channel

Microscopy & Image Analysis

 

 

View Channel

Laboratory Automation & IT Solutions

 

 

View Channel

 

Basecamp Research launches ZymCTRL a world-first, open source, generative AI tool that designs enzymes for more sustainable industrial processes

publication date: Jul 12, 2024
 | 
author/source: Basecamp Research

basecamp-research-launches-zymctrl-worldfirst-open

 

  • ZymCTRL is the world’s first open-source, text-based enzyme generation model and can be used across multiple industries, including therapeutics and sustainability initiatives.
  • AI model able to produce sequences that produced functional enzymes with desirable characteristics for industrial applications.

 

Basecamp Research, a world leader in artificial intelligence (AI)-based design of proteins and other biological systems, in partnership with the Ferruz Laboratory at the Institute of Molecular Biology of Barcelona have announced the release of ZymCTRL (“enzyme control”), a ChatGPT-like tool that generates new sequences from scratch based on a user simply typing in an enzyme identification code, which specifies the desired activity.

Large language models (LLMs), such as ChatGPT, have proven useful in helping scientists design and generate protein sequences.. However, current models require further training as well as conditioning on a known protein starter sequence (“seed sequence”) for protein generation.

ZymCTRL is a next-generation end-to-end protein LLM that offers rapid, cost-effective design capabilities for generating artificial enzymes. In contrast to other LLMs, the tool requires no seed sequence, giving end users complete control. Another important feature is ZymCTRL’s ability to create enzyme sequences that work but share only 30% resemblance to those in the training set – expanding the possibilities for designing new enzymes.

“With ZymCtrl, generating highly specific enzymes is as easy as interacting with a chatbot”, said Noelia Ferruz who has been partnering with Basecamp Research for over 2 years. The Ferruz lab is considered a pioneer in the field of AI for protein design, having previously built ProtGPT2. a deep unsupervised language model for protein design.

“Even before the release of ChatGPT,  we began working on large language models with Noelia because we think these models represent the future of biological research and protein design,” said Dr. Philipp Lorenz, CTO of Basecamp Research. “We’re deeply excited by these results and ZymCTRL’s ability to create functional enzymes that can solve some of today’s biggest challenges, from finding new ways to treat devastating diseases to building greener and more sustainable catalytic processes in bioindustry.”

The open source ZymCTRL model has been independently reviewed by academics in Structural Biology and ChemBioChem, peer-reviewed scientific journals. In ChemBioChem, researchers at The Institute of Biochemistry at Austria’s Graz University of Technology, cited ZymCTRL’s efficiency and ease of use. “ZymCtrl designs putative enzyme variants on consumer GPUs within seconds and, remarkably, it creates these sequences with only an EC number as input,” wrote Horst Lechner, principle investigator for the institute, which is focuses on enzyme design that differs from what’s seen in nature.

Basecamp Research is sharing ZymCTRL open source with researchers and sees an array of potential applications, including designing enzymes for disease treatment and diagnostics, biofuel production, sustainable agriculture innovations and much more.

While ZymCTRL was initially trained on publicly available datasets, it can also be integrated with other datasets, including Basecamp Research’s proprietary BaseGraph database, to further optimise the model and improve sequence outputs.

 

Highlights

ZymCTRL was first trained on the BRENDA enzyme database, comprising 37M enzyme sequences.

From this, the team generated sets of carbonic anhydrases, enzymes that accelerate the conversion of carbon dioxide to bicarbonate, helping capture and store CO2, and lactate dehydrogenases, enzymes that help convert sugar into energy in our cells, with no further fine-tuning for the AI model. After producing and purifying the proteins, several showed enzyme activity despite less than 40% of their sequences resembling proteins seen in the public database. This happened with no additional adjustments to the model.

To correct for potential biases in public databases, which have uneven sampling due a lack of biodiversity, ZymCTRL was adjusted using a wider range of lactate dehydrogenase sequences from Basecamp Research’s proprietary BaseGraph dataset.

With this fine-tuning, the team created lactate dehydrogenases with higher quality scores in silico (in computer simulations), such as better predicted local distance difference test (pLDDT) values, compared to sequences generated with no prior training.

Remarkably, active enzymes continued to show significant activity at a high temperature of 45°C as well as across a broad pH range of 4.5 to 9.5 – meaning it can work or stay stable in slightly acidic to slightly basic environments – offering significant industry advantages over naturally-occurring lactate dehydrogenases. This excellent pH tolerance allows a single enzyme to be used in many different processes with different pH levels, making the enzyme very useful and adaptable for many applications.

Two of the artificial lactate dehydrogenase enzymes were produced in larger amounts and successfully freeze-dried. They kept their activity and showed they could work in complex reactions under harsh conditions, supporting their potential for industrial use.

“Beyond the obvious excitement of being able to generate truly de novo proteins, the results are a further testament to the ability of Basecamp Research’s dataset to produce better results compared to publicly available datasets, which barely scratch the surface of the Earth’s immense biodiversity,” added Dr. Glen Gowers, co-founder of Basecamp Research. “Earlier we were able to show that our BaseFold model, also powered by our dataset, outperformed AlphaFold2 in predicting protein structures. Generative AI is going to have a huge impact across biotech, and we’re dedicated to collecting the data and tools needed to make its potential a reality.” The full preprint can be found HERE

Basecamp Research invites the research community to try ZymCTRL and has released it for public use on Hugging Face

 

About Basecamp Research

Basecamp Research is a leader in mapping biodiversity for AI-based design of biological systems. We match and refine novel proteins for our partners’ exact industrial, therapeutic or diagnostic applications using BaseGraph™, a new generation of AI design that is powered by the first-ever high-resolution map of global genetic biodiversity. 

Understanding the full genetic, evolutionary, and environmental context of each protein allows Basecamp Research to design tailored proteins for specific applications without the need for expensive and time-consuming directed evolution campaigns. We’re a team of explorers, scientists and policy experts driven by our ambition to protect and learn from nature’s diversity, whilst delivering life-changing breakthroughs to those who need them most. 

 

 



 

News Channels

 

 

Subscribe to any of our newsletters for the latest on new laboratory products, industry news, case studies and much more!

Newsletters from Lab Bulletin

 

Request your free copies HERE

 

 

 

Popular this Month

Top 10 most popular articles this month

 

 

Today's Picks

 

 

 

 

Looking for a Supplier?

Search by company or by product

 


Company Name:

Product:


 

 

 

 

Please note Lab Bulletin does not sell, supply any of the products featured on this website. If you have an enquiry, please use the contact form below the article or company profile and we will send your request to the supplier so that they can contact you directly.

Lab Bulletin is published by newleaf marketing communications ltd.


 

Media Partners

 

Exhibitions & Events