A new way to represent atomic species improves molecular machine learning models

AUG 23, 2019

The creation of elemental modes to represent atomic species enables transfer learning and lowers computational cost, bringing chemical modeling closer to a single machine learning model that can be fine-tuned for different systems.

Chris Patrick

DOI: 10.1063/1.5123456

A new way to represent atomic species improves molecular machine learning models internal name — A new way to represent atomic species improves molecular machine learning models lead image

Machine learning helps chemists model molecular possibilities and predictions. Currently, however, every atomic species in a molecule requires a different machine learning model, which is computationally costly and prevents information sharing between models. Developing a better way to represent atomic species is a necessary step for realizing one machine learning model that could be applied to any system.

Herr et al. present a new way to represent atomic species, called elemental modes. The authors identified a set of physical properties for each atomic species and compressed these properties into smaller dimensional space with an auto-encoder, a type of artificial neural network. These compressed representations are the elemental modes, which retain periodic table trends but are scalable for machine learning models.

To evaluate their performance, the authors used the elemental modes to train a neural network to predict formation energies of a crystalline material. The network did so with increased accuracy, demonstrating that the elemental modes could help rapidly screen new materials and drug candidates before synthesis at lower computational cost.

The neural network was also able to generalize its knowledge of a single element to improve predictions for another. When the authors removed chloride from the training data, the network was still able to extrapolate information about chlorine from its knowledge of other elements. This transfer learning reduces the amount of required training data.

Author John Herr said the work demonstrates that it is possible to take generalized models trained on large datasets and fine-tune them to a specific system with smaller amounts of data.

Source: “Compressing physics with an auto-encoder: Creating an atomic species representation to improve machine learning models in the chemical science,” by John E. Herr, Kevin Koh, Kun Yao, and John Parkhilll, The Journal of Chemical Physics (2019). The article can be accessed at https://doi.org/10.1063/1.5108803 .