NASA Uses AI to Sharpen Metadata Tagging for Earth Science Datasets

NASA has rolled out an upgraded artificial intelligence tool to improve how Earth science datasets are tagged and discovered. The Global Change Master Directory Keyword Recommender (GKR), powered by the INDUS language model, now automates keyword suggestions with greater speed and precision, helping scientists more easily find the data they need.

Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern U.S. with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size.
Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern US with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size. Image Credit: NASA Worldview

Built on a deep-learning architecture trained on 66 billion words from scientific literature, GKR addresses key challenges in metadata tagging, from class imbalance to rare keyword recognition. The latest update significantly expands keyword coverage and introduces techniques that improve performance on complex classification tasks.

The Challenge: Managing a Flood of Data

Earth science research produces massive amounts of data—from satellite images and atmospheric readings to climate projections and ocean measurements. However, collecting the data is only half the battle. For scientists to actually find and use it, each dataset needs to be described with clear, consistent metadata: keywords that capture what the data is about, how it was collected, and why it matters.

NASA’s Global Change Master Directory (GCMD) provides a controlled vocabulary for this purpose; a standardized list of thousands of keywords. However, tagging datasets with the right terms has traditionally been a manual task. It’s labor-intensive, prone to human error, and often inconsistent, especially as the number and complexity of datasets continue to grow.

To address this, NASA’s Office of Data Science and Informatics created the Keyword Recommender (GKR)—an AI-based tool that suggests relevant keywords automatically. While the original version was a major step forward, the increasing scale and diversity of NASA’s data called for a more sophisticated approach.

The Upgrade: More Keywords, Smarter Recommendations

The new GKR is powered by INDUS, a transformer-based language model designed specifically for scientific language. Trained on publications from Earth science, astrophysics, and other NASA research areas, INDUS understands the context and nuance of technical terms, enabling more accurate keyword suggestions.

With this upgrade, GKR now supports over 3200 keywords, which is seven times more than the previous version. It also uses a technique called focal loss, which helps the model handle rare or underused keywords more effectively by adjusting how it learns from the training data. Combined with a significantly expanded training set up (from 2000 to 43,000 metadata records) the model delivers sharper, more relevant results.

Why it Matters

Good metadata is more than a backend concern as it can directly affect how easily researchers can find and use data. With better keyword tagging, datasets become more searchable, more discoverable, and more useful across scientific disciplines.

The upgraded GKR is already making a difference on platforms like Earthdata Search and the International Data Network, where improved tagging helps researchers quickly zero in on the data they need. It also reduces the burden on scientists and data managers, freeing them up to focus on analysis instead of metadata entry.

What’s Next

The upgraded GKR is just the beginning. NASA plans to continue refining the tool, incorporating feedback from researchers and exploring ways to make keyword recommendations even more context-aware. Future versions may expand beyond the current GCMD keyword set, supporting custom vocabularies or multilingual tagging for broader international use.

The INDUS model powering GKR is also being integrated into other NASA systems, like the Science Discovery Engine, where it’s improving how researchers search across vast collections of scientific content. There’s also growing interest in adapting these tools for other domains, including climate adaptation, planetary science, and even biomedical research.

As the volume and complexity of scientific data continue to grow, tools like GKR will play a critical role in managing that information—helping scientists spend less time digging through datasets and more time generating insights.

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2025, July 22). NASA Uses AI to Sharpen Metadata Tagging for Earth Science Datasets. AZoRobotics. Retrieved on July 22, 2025 from https://www.azorobotics.com/News.aspx?newsID=16118.

  • MLA

    Nandi, Soham. "NASA Uses AI to Sharpen Metadata Tagging for Earth Science Datasets". AZoRobotics. 22 July 2025. <https://www.azorobotics.com/News.aspx?newsID=16118>.

  • Chicago

    Nandi, Soham. "NASA Uses AI to Sharpen Metadata Tagging for Earth Science Datasets". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=16118. (accessed July 22, 2025).

  • Harvard

    Nandi, Soham. 2025. NASA Uses AI to Sharpen Metadata Tagging for Earth Science Datasets. AZoRobotics, viewed 22 July 2025, https://www.azorobotics.com/News.aspx?newsID=16118.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.