News & Analysis
/
Article

When the average model knows best

AUG 07, 2020
Ensemble-averaging allows the application of machine learning models on small data sets, enabling more researchers to use machine learning in their work.

DOI: 10.1063/10.0001749

When the average model knows best internal name

When the average model knows best lead image

Machine learning typically requires large data sets for modeling, which creates a logistical challenge when the collection of large amounts of data isn’t convenient or easy. Without enough data, machine learning models can introduce large chance factors into the process and significant variations in model quality.

Vanpoucke et al. show that ensemble-averaged models can mitigate the chance factor and irregularities that result from using small data sets.

“Our specific approach to ensemble models makes the resulting model simple and straightforward through their construction,” said author Danny E. P. Vanpoucke. “This also is very beneficial from the perspective of model interpretation, as the ensemble average corresponds to a single model instance; the model complexity is strikingly modest.”

Through comparison, the authors found a wide variation in results for the same data set with different model realizations, specifically with regression models for machine learning. They found that averaging a set of model realizations using an ensemble-averaged model, resulted in the best possible model quality as opposed to just being average, as one may expect from an ensemble model.

“By showing that the qualities of individual model realizations are governed rather by luck than design, we now know that selecting the ‘best model realization’ from a set of 10 or 100 is not the best strategy. Instead, the average of the entire set will provide a high-quality model realization, even if all realizations of the set are of low quality,” said Vanpoucke.

Research projects using big data sets will also benefit from this knowledge because it can reduce both the computational cost of training models and the amount of data required.

Source: “Small data materials design with machine learning: When the average model knows best,” by Danny E. P. Vanpoucke, Onno S.J. van Knippenberg, Ko Hermans, Katrien V. Bernaerts, and Siamak Mehrkanoon, Journal of Applied Physics (2020). The article can be accessed at https://doi.org/10.1063/5.0012285 .

Related Topics
More Science
/
Article
An array of graphene-silicon solar cells provides enough power to continuously supply small devices unconnected from the power grid.
/
Article
Better glass-forming metals have sharper liquid-to-liquid phase transitions than average glass-forming metals.
/
Article
Transient cosmic ray phenomena produced by a solar superstorm can be linked to variations in atmospheric electricity.
/
Article
Small concentrations of active molecules trigger a liquid transition in supercooled water even at low temperatures