Machine Learning Predicts Toxic Metal Levels in Marine Systems

*Important notice: This news reports on an unedited version of the paper, which has been accepted. and is awaiting final editing. Scientific Reports sometimes publishes preliminary scientific reports that are not fully edited and, therefore, should not be regarded as conclusive or treated as established information. 

Machine learning models with feature selection accurately predict aluminum concentrations using fewer variables. This improves interpretability and efficiency for environmental monitoring of marine ecosystems.

Study: Forecasting toxic metal concentrations in an inland sea ecosystem with machine learning algorithms. Image Credit: Rafael Matus/Shutterstock

In an article published in the journal Nature, researchers used machine learning (ML) and feature selection to predict aluminum (Al) concentrations in an inland marine ecosystem. They investigated relationships between Al and other elements, finding that accurate predictions are possible using a reduced subset of key elements rather than all features, yielding simpler, more interpretable models.

Toxic Metal Pollution in Marine Ecosystems

Toxic metal pollution, particularly Al, threatens marine ecosystems due to industrial and agricultural activities. Al accumulates in saltwater, causing toxicity to organisms and risks to human health. Previous work has successfully applied ML to predict heavy metal concentrations but often relied on large, complex feature sets, limiting model interpretability. 

A gap exists in determining whether accurate predictions can be achieved using a reduced subset of chemical elements rather than full datasets. This study addressed that gap by applying ML combined with feature selection to predict Al concentrations in water and sediment from the Sea of Marmara. It aims to identify key elemental predictors, improving prediction efficiency and model interpretability for environmental monitoring.

Data Collection, Feature Selection, and Predictive Modeling

Researchers collected water and sediment samples from 17 locations in the Sea of Marmara over ten days. Sediment samples were dried in an oven, and both sample types were subjected to chemical digestion with acids in a microwave device. After filtration and dilution, the concentrations of 15 elements (including Al, Iron, Lead, and Zinc) were measured using inductively coupled plasma optical emission spectroscopy (ICP-OES) analysis. 

To improve prediction accuracy and reduce complexity, two feature selection techniques were applied. Recursive feature elimination (RFE) works by repeatedly removing the least important features and retraining the model until the optimal subset remains. A genetic algorithm (GA) takes a different approach, mimicking biological evolution by testing various feature combinations and using crossover and mutation to evolve toward the best solution. GA is more computationally intensive but excels at uncovering complex, nonlinear relationships in data. 

Six predictive models were employed. Multiple linear regression (MLR) establishes linear relationships between elements but requires strict statistical assumptions. Elastic net combines two regularization techniques to prevent overfitting while handling correlated features. Type-1 Fuzzy functions model uncertainties without needing expert rules, using clustering to capture relationships.

Extreme gradient boost (XGBoost) builds sequential decision trees, each correcting previous errors, with built-in regularization to avoid overfitting. Random forest averages many decision trees for robust predictions. Finally, ensemble learning methods combine multiple models, using simple averaging or stacked architectures, to achieve superior predictive performance. All models underwent hyperparameter tuning to optimize their performance for predicting Al concentrations.

Experimental Findings and Geochemical Interpretation

Data were split 50% training, 20% validation, and 30% testing. Two feature selection methods, RFE and GA, were applied, reducing features from 14 to six in both datasets. 

For water samples, the best individual model was XGBoost combined with GA-selected features, achieving a root mean square error (RMSE) of 0.0354. This improved prediction by 44.4% compared to using all features and by 22.2% compared to standard MLR. Shapley additive explanations (SHAP) analysis revealed Copper and Chromium as the most influential predictors. 

For sediment samples, XGBoost with GA-selected features again performed best, with an RMSE of 356.57, a 20% improvement over the full dataset and 45% better than MLR. SHAP analysis highlighted Boron and Cadmium as key features. 

Feature selection significantly improved tree-based models (XGBoost, Random Forest) but not linear models (MLR), because tree-based methods better capture nonlinear relationships and interactions. The stacking ensemble method also showed competitive performance. 

Geochemically, selected features in water reflect anthropogenic inputs and hydrographic mixing, while sediment features indicate both natural lithogenic background and human influences. Al distribution in the Sea of Marmara aligns with regional studies showing Al as primarily lithogenic, with contributions from industrialization and urbanization.

Implications for Interpretable Environmental Monitoring

This study successfully demonstrated that combining ML with feature selection can accurately predict Al concentrations in water and sediment from the Sea of Marmara using a reduced set of elements. Among the models tested, XGBoost combined with GA-based feature selection achieved the best performance, reducing the number of predictors from 14 to 6 while improving prediction accuracy by 44.4% for water samples and 20% for sediment samples.

Key predictors included Copper, Chromium, Boron, and Cadmium. These findings confirm that feature selection enhances model interpretability and efficiency without sacrificing performance. Future research should explore deep learning approaches and incorporate seasonal and spatial variations across broader sampling areas to further improve predictive capabilities for environmental monitoring.

Journal Reference

Ucan, A., Tak, N., Hocaoglu-Ozyigit, A., & Ozyigit, I. I. (2026).  Scientific Reports. DOI:10.1038/s41598-026-48252-5, https://www.nature.com/articles/s41598-026-48252-5 

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2026, April 23). Machine Learning Predicts Toxic Metal Levels in Marine Systems. AZoRobotics. Retrieved on April 23, 2026 from https://www.azorobotics.com/News.aspx?newsID=16384.

  • MLA

    Nandi, Soham. "Machine Learning Predicts Toxic Metal Levels in Marine Systems". AZoRobotics. 23 April 2026. <https://www.azorobotics.com/News.aspx?newsID=16384>.

  • Chicago

    Nandi, Soham. "Machine Learning Predicts Toxic Metal Levels in Marine Systems". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=16384. (accessed April 23, 2026).

  • Harvard

    Nandi, Soham. 2026. Machine Learning Predicts Toxic Metal Levels in Marine Systems. AZoRobotics, viewed 23 April 2026, https://www.azorobotics.com/News.aspx?newsID=16384.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.