In a recent study led by Dae Koh at Stanford University, three UQ methods were evaluated for deep neural networks in the task of single particle classification, multi-particle classification, and semantic segmentation using high-resolution 3D liquid argon time projection chamber (LArTPC) energy deposition images. The three UQ methods considered in the study were model ensembling, Monte Carlo Dropout (MCD), and Evidential Deep Learning (EDL).
Model ensembling refers to the process of training multiple instances of the same architecture with different random initialization seeds. In the case of Naive Ensembling (NE), each member of the ensemble is trained on the same training dataset, resulting in N networks with identical architecture but different parameter values. To achieve better generalization and stability, Bootstrapped Ensembling (BE) is preferred over naive ensembling. This is done by training each ensemble member on a dataset reorganized by sampling N examples from the full training set with replacement.
The study found that ensemble methods achieved the highest accuracy with better distributional separation compared to Monte Carlo Dropout and evidential models. The results showed that the quality of uncertainty quantification greatly depends on the type of classifier’s task and that it is possible for Bayesian models to perform worse than deterministic networks in terms of calibration.
The benefit of a detailed assessment of different UQ algorithms on a complex, multi-objective task such as LArTPC data reconstruction is two-fold. Firstly, it allows practitioners in machine learning to evaluate the applicability of BDL in a real-world setting. Secondly, it enables physicists to design neural networks that produce well-justified uncertainty estimates for rejecting erroneous predictions and detecting out-of-distribution instances.