IBM and NASA’s Marshall Space Flight Center have announced a collaboration to use IBM’s artificial intelligence (AI) technology to uncover new insights in NASA’s massive trove of Earth and geospatial science data. For the first time, the collaborative work will apply AI foundation model technology to NASA’s Earth-observing satellite data.
Foundation models are AI models that are trained on a large set of unlabeled data, can be used for various tasks, and can apply knowledge from one situation to another. Over the last five years, these models have rapidly advanced the field of natural language processing (NLP) technology, and IBM is pioneering applications of foundation models beyond language.
Earth observations are being collected at unprecedented rates and volumes, allowing scientists to study and monitor our planet. However, new and innovative approaches are required to extract knowledge from these vast data resources. This work aims to make it easier for researchers to analyse and draw conclusions from large datasets. IBM’s foundation model technology has the potential to accelerate the discovery and analysis of these data, allowing scientists to advance their understanding of the Earth and respond to climate-related issues more quickly.
IBM and NASA intend to create new technologies to extract information from Earth observations. For example, one project will use NASA’s Harmonized Landsat Sentinel-2 (HLS) dataset to train an IBM geospatial intelligence foundation model, a record of land cover and land use changes captured by Earth-orbiting satellites. This foundation model technology will assist researchers in critically analysing our planet’s environmental systems by analysing petabytes of satellite data to identify changes in the geographic footprint of phenomena such as natural disasters, cyclical crop yields, and wildlife habitats.
This collaboration is also expected to produce an easily searchable corpus of Earth science literature. To organise the literature and make it easier to discover new knowledge, IBM created an NLP model trained on nearly 300,000 Earth science journal articles. The fully trained model uses PrimeQA, IBM’s open-source, multilingual question-answering system, and is one of the largest AI workloads trained on Red Hat’s OpenShift software to date. The new language model for Earth science could be incorporated into NASA’s scientific data management and stewardship processes and serve as a resource for researchers.
Other potential IBM-NASA collaborative projects in this agreement include developing a foundation model for weather and climate prediction using MERRA-2, an atmospheric observation dataset. This collaboration is part of NASA’s Open-Source Science Initiative, a decade-long commitment to creating an inclusive, transparent, and collaborative open science community.
“Foundation models have proven successful in natural language processing, and it’s time to expand that to new domains and modalities important for business and society. Applying foundation models to geospatial, event-sequence, time-series, and other non-language factors within Earth science data could make enormously valuable insights and information suddenly available to a much wider group of researchers, businesses, and citizens. Ultimately, it could facilitate a larger number of people working on some of our most pressing climate issues.”
Raghu Ganti, principal researcher at IBM.
“The beauty of foundation models is they can potentially be used for many downstream applications. Building these foundation models cannot be tackled by small teams,” he added. “You need teams across different organizations to bring their different perspectives, resources, and skill sets.”
Rahul Ramachandran, senior research scientist at NASA’s Marshall Space Flight Center in Huntsville, Alabama.