Researchers at the University of Florida’s Institute of Food and Agricultural Sciences have used consumer sensory panel information together with metabolomic (chemical) profile data to build machine learning models that predict the flavors of fruits.
Consumers perceive a decline in flavor from commercially grown produce. This has been partly driven by crop producers who focus on maximizing their yields while neglecting quality. Consumer sensory panels are limited in scope and cannot be scaled. Machine learning models promise to increase throughput in flavor evaluations and help crop producers integrate the design of attractive flavor profiles earlier in their breeding cycle.
Due to cost and logistical limitations, breeders do not typically employ consumer panels in their programs.
Prof. Harry Klee, Institute of Food and Agricultural Sciences, University of Florida
A Brief Introduction to Machine Learning Models
Machine learning models describe computational algorithms that feed on data to improve through experience. Machine learning algorithms are not explicitly programmed but use this “training data” to make predictions. Most importantly, these algorithms improve the more they learn. There are three types of machine learning algorithms:
- Supervised Learning: Supervised learning algorithms seek to predict outcomes from a set of variables. Inputs are mapped to outputs. The learning process continues until a satisfactory level of accuracy is achieved. Supervised learning algorithms include Random Forest and Decision Tree.
- Unsupervised Learning: These algorithms do not predict a specific outcome but rather are used to extract relationships, group or summarize data. K-means and Apriori are examples of these algorithms.
- Reinforcement Learning: Much like human decision making, reinforcement learning algorithms learn through continuous trial and error - within a specific environment - to reach a decision. In other words, they learn to make better decisions through experience. The Markov Decision Process is one such algorithm.
Common machine learning algorithms are developed in Python and R code and include Linear Regression, Decision Tree, Random Forest and gradient boosting algorithms such as XGBoost and CatBoost.
Building Machine Learning Models that Predict Flavors
Fruits contain a variety of compounds, acids, sugars and volatiles, the composition of which is influenced by genetic and environmental factors. Sugars and acids are mostly detected by receptors in the tongue, while volatiles are detected in the olfactory epithelium.
Dr. Marcio Resende led the team at the University of Florida’s Institute of Food and Agricultural Sciences that proposed to map data from volatiles in blueberries and tomatoes into statistical models. This data was readily available from the University of Florida’s decade-long blueberry and tomato breeding program.
The team proposed to pair this information with consumer sensory panel data to build models that could predict flavor preferences. Thus, they could identify specific metabolites (small-molecule chemicals) for marker-assisted selection (MAS) to assist crop producers in growing flavor-rich fruit. This would give growers the opportunity to select from hundreds of genotypes per season, thus increasing their throughput in flavor evaluations (flavor phenotyping).
The research team fed a variety of blueberries and tomatoes to consumers at the University of Florida’s Sensory Lab. They then collected feedback on umami, sweetness, sourness, flavor intensity and liking.
Most flavor studies use partial least squares (PLS) and linear regression models. However, the large variety of chemical compounds found in fruit presents a number of challenges.
The novelty of the Florida team’s approach is that they used self-learning computational algorithms to build their statistical model at the metabolome (small-molecule chemical) level. Their aim was to map the chemical profile of fruit to flavor perception.
The team implemented 18 statistical and machine learning models to predict sensory traits from sugar, acid and volatiles. The models included decision tree (XGBoost, random forest, gradient boosting machines), neural network (Bayesian, multilayer perceptron), genomic selection (Bayes A, Bayes B), regularization (elastic net, ridge regression, LASSO), and kernel (relevant vector machines, support vector machines).
All the models were written in R - BGLR for the Bayesian models and caret for the machine learning models. The team used 10-fold cross-validation to evaluate their models, where each fold was rated by correlations between consumer panel and predicted ratings.
The XGBoost model, in particular, showed an average improvement of 20% over the linear regression (and 11% over the PLS) models conventionally used. The accuracy of XGBoost ranged from 0.62 to 0.87 across all traits of tomatoes and blueberries. The full model showed improvements of 3.2 to 36.7% compared to a model which only accounted for sugars and acids.
The Florida researchers have demonstrated the superior accuracy of metabolomic machine learning models over conventional genomic models in predicting consumer flavor preferences. This gives crop producers an unprecedented opportunity to control the quality of their produce at scale while also maximizing yields.
We expect that these models will enable an earlier incorporation of flavor as a breeding target and encourage selection and release of more flavorful fruit varieties.
Dr. Marcio Resende, Institute of Food and Agricultural Sciences, University of Florida
References and Further Reading
Klee, H., et al., (2022) Metabolomic selection for enhanced fruit flavor. PNAS [online] Available at: https://www.pnas.org/content/119/7/e2115865119
Buck, B. (2022) UF/IFAS researchers creating an ‘Artificial Intelligence Connoisseur’ [online] Available at: https://blogs.ifas.ufl.edu/news/2022/02/07/uf-ifas-researchers-creating-an-artificial-intelligence-connoisseur/