Supervised machine learning looks to help researchers design collective variables
Supervised machine learning looks to help researchers design collective variables lead image
Collective variables are functions designed using simulation data to provide a simplified representation of complex molecular systems. The invention of new methods for the identification of optimal collective variables describing protein dynamics is a highly active area of data science applied to biochemical physics.
In computationally intensive molecular dynamics simulations, choosing the appropriate collective variable takes on heightened importance, though defining the appropriate collective variable is often challenging. Advances in machine learning promise to provide new tools that improve our ability to effetively define collective variables from biomolecular simulation data.
Sultan and Pande demonstrated a method for designing collective variables for accelerated sampling with the help of supervised machine learning (SML) algorithms. Using solvated alanine dipeptides (amino acids) and the mini-protein Chignolin as examples, the group showed that their SML techniques produced the first estimate of collective variables from limited data that can be further improved on by other forms of parameter optimization.
The authors report several approaches that may be used to reversibly sample slow structural transitions between protein conformational states, including output probability estimates using logistic models and the outputs from statistical classifications known as shallow or deep neural network classifiers. Sultan said he hopes the current paper can serve as a bridge between the group’s previous work on machine learning and Markov state modeling with enhanced sampling.
“We hope this will allow researchers to worry less about the design of enhanced sampling simulations, allowing them to focus more on interpreting the results or designing new simulations,” Sultan said. “We also hope that this will stimulate more discussion on the use of other ML algorithms for accelerating molecular simulations.”
Source: “Automated design of collective variables using supervised machine learning,” by Mohammad M. Sultan and Vijay S. Pande, The Journal of Chemical Physics (2018). The article can be accessed at https://doi.org/10.1063/1.5029972