Abstract
Data-driven turbulence modelling approaches are gaining increasing interest
from the CFD community. Such approaches generally aim to improve the modelled
Reynolds stresses by leveraging data from high fidelity turbulence resolving
simulations. However, the introduction of a machine learning (ML) model
introduces a new source of uncertainty, the ML model itself. Quantification of
this uncertainty is essential since the predictive capability of data-driven
models diminishes when predicting physics not seen during training. In this
work, we explore the suitability of Mondrian forests (MF's) for data-driven
turbulence modelling. MF's are claimed to possess many of the advantages of the
commonly used random forest (RF) machine learning algorithm, whilst offering
principled uncertainty estimates. On a manufactured test case these claims are
substantiated, providing feature selection is first performed to remove
irrelevant features from the training data. A data-driven turbulence modelling
test case is then constructed, with a turbulence anisotropy constant derived
from high fidelity data the quantity to predict. A number of flows at several
Reynolds numbers are used for training and testing. Irrelevant features are not
found to be a problem here. MF predictions are found to be superior to those
obtained from a commonly used linear eddy viscosity model. Shapley values,
borrowed from game theory, are used to interpret the MF predictions. Predictive
uncertainty is found to be large in regions where the training data is not
representative. Additionally, the MF predictive uncertainty is compared to the
uncertainty estimated from applying jackknifing to random forest predictions,
and to an a priori statistical distance measure. In both cases the MF
uncertainty is found to exhibit stronger correlation with predictive errors,
which indicates it is a better measure of prediction confidence.
Users
Please
log in to take part in the discussion (add own reviews or comments).