NASA has rolled out an upgraded artificial intelligence tool to improve how Earth science datasets are tagged and discovered. The Global Change Master Directory Keyword Recommender (GKR), powered by the INDUS language model, now automates keyword suggestions with greater speed and precision, helping scientists more easily find the data they need.
Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern US with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size. Image Credit: NASA Worldview
Built on a deep-learning architecture trained on 66 billion words from scientific literature, GKR addresses key challenges in metadata tagging, from class imbalance to rare keyword recognition. The latest update significantly expands keyword coverage and introduces techniques that improve performance on complex classification tasks.
The Challenge: Managing a Flood of Data
Earth science research produces massive amounts of data—from satellite images and atmospheric readings to climate projections and ocean measurements. However, collecting the data is only half the battle. For scientists to actually find and use it, each dataset needs to be described with clear, consistent metadata: keywords that capture what the data is about, how it was collected, and why it matters.
NASA’s Global Change Master Directory (GCMD) provides a controlled vocabulary for this purpose; a standardized list of thousands of keywords. However, tagging datasets with the right terms has traditionally been a manual task. It’s labor-intensive, prone to human error, and often inconsistent, especially as the number and complexity of datasets continue to grow.
To address this, NASA’s Office of Data Science and Informatics created the Keyword Recommender (GKR)—an AI-based tool that suggests relevant keywords automatically. While the original version was a major step forward, the increasing scale and diversity of NASA’s data called for a more sophisticated approach.
The Upgrade: More Keywords, Smarter Recommendations
The new GKR is powered by INDUS, a transformer-based language model designed specifically for scientific language. Trained on publications from Earth science, astrophysics, and other NASA research areas, INDUS understands the context and nuance of technical terms, enabling more accurate keyword suggestions.
With this upgrade, GKR now supports over 3200 keywords, which is seven times more than the previous version. It also uses a technique called focal loss, which helps the model handle rare or underused keywords more effectively by adjusting how it learns from the training data. Combined with a significantly expanded training set up (from 2000 to 43,000 metadata records) the model delivers sharper, more relevant results.
Why it Matters
Good metadata is more than a backend concern as it can directly affect how easily researchers can find and use data. With better keyword tagging, datasets become more searchable, more discoverable, and more useful across scientific disciplines.
The upgraded GKR is already making a difference on platforms like Earthdata Search and the International Data Network, where improved tagging helps researchers quickly zero in on the data they need. It also reduces the burden on scientists and data managers, freeing them up to focus on analysis instead of metadata entry.
What’s Next
The upgraded GKR is just the beginning. NASA plans to continue refining the tool, incorporating feedback from researchers and exploring ways to make keyword recommendations even more context-aware. Future versions may expand beyond the current GCMD keyword set, supporting custom vocabularies or multilingual tagging for broader international use.
The INDUS model powering GKR is also being integrated into other NASA systems, like the Science Discovery Engine, where it’s improving how researchers search across vast collections of scientific content. There’s also growing interest in adapting these tools for other domains, including climate adaptation, planetary science, and even biomedical research.
As the volume and complexity of scientific data continue to grow, tools like GKR will play a critical role in managing that information—helping scientists spend less time digging through datasets and more time generating insights.
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.