AI Helps Scientists Pinpoint Where Groundwater is No Longer Safe in India

By merging geochemical insights with artificial intelligence (AI), researchers have uncovered critical contamination across Kasganj’s groundwater and identified a powerful data-driven tool to forecast and manage water safety.

Cold spring water gushing from the mountain.

Study: Integrated groundwater quality assessment using geochemical modeling and machine learning approach in Northern India. Image Credit: Basico/Shutterstock.com

In an article published in the Scientific Reports journal (a Nature Portfolio publication), researchers assessed groundwater quality in Kasganj, India, for drinking and irrigation. They found significant contamination, with many samples exceeding safe limits for total dissolved solids (TDS) and fluoride.

The study uniquely employed water quality indexing (WQI), geochemical analysis, and machine learning (ML) models, identifying random forest (RF) as the best predictor.

Groundwater Under Growing Pressure

Groundwater is a critical global resource, especially in arid and semi-arid regions, where it is the primary source of drinking and irrigation water. However, rapid urbanization, industrialization, and population growth have led to its widespread contamination, posing severe risks to human health and agriculture.

Traditionally, the WQI and irrigation WQI (IWQI) have been vital tools for simplifying complex water quality data into a comprehensible format for assessment and management. Previous research has effectively utilized these indices and begun exploring ML models like artificial neural networks (ANNs) and support vector machines (SVMs) for water quality prediction. Despite this progress, a significant research gap exists in the application and comparison of more advanced ML algorithms, specifically RF and extreme gradient boosting (XGB), for predicting WQI.

This paper filled this gap by conducting a comprehensive physicochemical analysis of groundwater in Kasganj, India, and uniquely employing a suite of advanced models, such as RF, ANN, and XGB, to predict WQI with high accuracy. This integrated approach of traditional indexing with cutting-edge predictive modeling provides a robust framework for identifying contamination hotspots and informing targeted remediation strategies.

Combining Geochemical Modeling with Machine Learning

The study area was characterized by a sub-humid climate with hot summers and relies on groundwater from alluvial aquifers, with water levels fluctuating seasonally. The research was conducted from August 2023 to July 2024, during which 115 groundwater samples were systematically collected from 23 pre-identified sites using tube wells, hand pumps, and submersibles.

The samples were stored in pre-washed high-density polypropylene (HDPP) bottles and analyzed for twelve key physicochemical parameters, including potential of Hydrogen (pH), TDS, fluoride, and various ions. Advanced analytical techniques like titration, flame photometry, and ultraviolet (UV) spectrophotometry were employed, maintaining an estimated error of less than ±5 %.

The data analysis involved multiple approaches. Geochemical modeling using PHREEQC software calculated mineral saturation indices to understand the water's interaction with aquifer materials like carbonate rocks. Furthermore, the researchers calculated both WQI and IWQI by integrating the measured parameters, classifying the water into categories from excellent to unsuitable for drinking and irrigation based on established standards.

A key and innovative aspect of the methodology was the application of three advanced machine learning models, namely, RF, ANN, and XGB, to predict the WQI. RF uses an ensemble of decision trees resistant to overfitting, ANN mimics biological neurons to model complex nonlinear relationships, and XGB sequentially builds trees to correct errors from previous ones. The advantages and disadvantages of each model were noted, with careful steps described for their implementation, including data preprocessing, k-fold cross-validation, and hyperparameter tuning to ensure robust and accurate predictions.

Critical Contamination and Model Performance 

The analysis revealed significant contamination, with TDS alarmingly high and fluoride levels exceeding the World Health Organization (WHO) limit in many samples. Hydrogeochemical analysis, using Piper and Gibbs diagrams, identified the water type predominantly as Ca-Mg-Cl, with rock-water interactions being the primary source of ions. A correlation analysis suggested fluoride mobilization is linked to local mineral weathering.

The application of the WQI classified 60.87 % of samples as "unfit" for drinking, with only 13.04 % being "moderately poor" and 26.08% categorized as "very poor", according to the study’s classification criteria. For irrigation, indices like sodium absorption ratio (SAR) and magnesium hazard (MH) were calculated to assess suitability.

A major component of the results was testing three machine learning models to predict the WQI. While all models (XGB, ANN, Random Forest) performed well, the RF model demonstrated the best predictive accuracy and generalization on unseen test data, achieving the highest R2 value (0.951) and the lowest error metrics (root mean square error (RMSE) of 5.97).

The discussion confirmed RF's superiority, aligning with other recent studies, and positions this integrated approach of traditional indexing with machine learning as a reliable and advanced method for water quality monitoring and management.

Data-Driven Insights for Safer Groundwater

The study revealed that groundwater in Kasganj, India, faces critical contamination, with more than 60 % of samples deemed unfit for drinking due to excessive TDS and fluoride levels exceeding WHO limits.

By combining traditional water quality indices with advanced machine learning models, the researchers created a powerful predictive framework for assessing groundwater health. Among the models tested, the Random Forest algorithm delivered the most reliable results, highlighting its value as a practical tool for long-term water quality monitoring. These findings underscore the urgent need for sustainable groundwater management and targeted remediation efforts to safeguard this vital resource. 

Journal Reference

Islam et al. (2025). Integrated groundwater quality assessment using geochemical modeling and machine learning approach in Northern India. Scientific Reports, 15(1). DOI:10.1038/s41598-025-21592-4. https://www.nature.com/articles/s41598-025-21592-4

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2025, November 04). AI Helps Scientists Pinpoint Where Groundwater is No Longer Safe in India. AZoRobotics. Retrieved on November 04, 2025 from https://www.azorobotics.com/News.aspx?newsID=16234.

  • MLA

    Nandi, Soham. "AI Helps Scientists Pinpoint Where Groundwater is No Longer Safe in India". AZoRobotics. 04 November 2025. <https://www.azorobotics.com/News.aspx?newsID=16234>.

  • Chicago

    Nandi, Soham. "AI Helps Scientists Pinpoint Where Groundwater is No Longer Safe in India". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=16234. (accessed November 04, 2025).

  • Harvard

    Nandi, Soham. 2025. AI Helps Scientists Pinpoint Where Groundwater is No Longer Safe in India. AZoRobotics, viewed 04 November 2025, https://www.azorobotics.com/News.aspx?newsID=16234.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.