AI trained like scientists could bridge interdisciplinary knowledge gaps

Written by

Published 10 Dec 2024

Fact checked by

NSFW AI Why trust Greenbot

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

Disclosure

Free Hospital Lab photo and picture

A new initiative called Polymathic AI is pushing the boundaries of artificial intelligence (AI) by training models to work like scientists across various fields. Researchers from the University of Cambridge and the Flatiron Institute have released two open-source datasets designed to teach AI systems to connect knowledge across disciplines such as astrophysics, biology, and fluid dynamics.

What the datasets offer

“These datasets are by far the most diverse large-scale collections of high-quality data for machine learning training ever assembled for these fields,” said Michael McCabe from the Flatiron Institute. Released on December 2, the datasets—spanning an enormous 115 terabytes, more than double the size of GPT-3’s training data—are available on Hugging Face, a platform hosting AI models and datasets.

The released data consists of two collections. The Multimodal Universe features 100 terabytes of astronomical observations, including galaxy images from NASA’s James Webb Space Telescope. The Well, a 15-terabyte dataset, spans simulations of biological systems, supernova explosions, and acoustic scattering.

While the two datasets may seem disconnected, they all require the modeling of mathematical equations called partial differential equations. Differential equations emerge in diverse scientific problems. They tend to be incredibly difficult to solve, even for supercomputers.

Dr. Miles Cranmer of Cambridge’s Institute of Astronomy says, “Just as LLMs such as ChatGPT learn to use common grammatical structure across languages, these new scientific foundation models might reveal deep connections across disciplines that we’ve never noticed before.”

Toward a new era of AI-driven discovery

Current AI tools often specialize in one field, making it difficult to uncover links between different areas of science. The Polymathic AI project aims to overcome this limitation by creating “polymathic” models—AI systems capable of reasoning and generalizing across domains. The initiative’s interdisciplinary team includes physicists, astrophysicists, mathematicians, computer scientists, and neuroscientists.

Dr. Payel Mukhopadhyay, another Cambridge researcher, emphasized the datasets’ potential, saying, “It will be exciting to see if the complexity of these datasets can push AI models to go beyond merely recognizing patterns, encouraging them to reason and generalize across scientific domains.”

The Polymathic AI team has already begun training models using these datasets, with early results described as “exciting.” In the coming months, these AI systems will be tested on tasks such as solving partial differential equations, which are crucial in fields like quantum mechanics and embryo development.

The project depicts a formative shift in AI development toward creating generalist systems that can drive scientific discovery. “These datasets are opening the door to true generalist scientific foundation models for the first time. What new scientific principles might we discover? We’re about to find out,” Dr. Cranmer stated.

By enabling AI to think like scientists, the Polymathic AI project could pave the way for breakthroughs that transcend traditional disciplinary boundaries. AI tools such as machine learning are increasingly common in scientific research and were recognized in two of this year’s Nobel Prizes.

This transformative potential suggests that these models could either serve as powerful tools to guide human researchers or eventually take the lead in addressing challenges that are currently beyond human reach. As AI systems grow more sophisticated, they are likely to become indispensable partners in shaping the future of scientific discovery.