To maximize the general accuracy of these machine learning models, he said, it is essential to design a highly diverse dataset from which to train the model. A challenge is that it is not obvious, a priori, what training data will be most needed by the ML model.
The team's recent work presents an automated "active learning" methodology for iteratively building a training dataset.
At each iteration, the method uses the current-best machine learning model to perform atomistic simulations; when new physical situations are encountered that are beyond the ML model's knowledge, new reference data is collected via expensive quantum simulations, and the ML model is retrained.
Through this process, the active learning procedure collects data regarding many different types of atomic configurations, including a variety of crystal structures, and a variety of defect patterns appearing within crystals.
The paper: Automated discovery of a robust interatomic potential for aluminum, Nature Communications, DOI: 10.1038/s41467-021-21376-0
The funding: This work was funded in part by the Los Alamos National Laboratory Advanced Simulation and Computing (ASC) program and computer time was provided by the Lawrence Livermore National Laboratory Sierra Supercomputer during its open access period.