Efficient Methods for ABC Design
Efficient Methods for ABC Design
Disciplines
Mathematics (100%)
Keywords
-
Bayesian experimental design,
Approximate Bayesian Computation,
Intractable Likelihoods,
Gaussian processes,
Expectation Propagation,
Parallel Computing
Optimal experimental design is concerned with determining the input factors of a statistical experiment in an optimal way with respect to the goal of inference (estimation, prediction, model discrimination) based on suitable design criteria before the experiment is actually conducted. In this way, the experimental effort can be reduced considerably. However, for many sophisticated statistical models used, for example, in biogenetics, epidemiology, or geostatistics, the model assumes no analytical form, which makes it impossible to compute the design criteria in a straightforward manner. A way to obtain estimates for these design criteria is to employ a technique called approximate Bayesian computation (ABC), which relies on simulating many observations from the statistical model. Consequently, experimental design methods that use ABC to estimate the design criteria were coined approximate Bayesian computation design (ABCD) methods. The ABCD methods developed so far are very simulation- and memory-intensive, so they may be employed only for simple and low- dimensional design settings. To increase the efficiency and therefore the applicability of ABCD, we will investigate many extensions of ABCD and develop new methods by building upon recent advances in simulation-based experimental design and in ABC. For example, we will consider to employ more efficient ABC algorithms or to use approxi- mations to the model or to the design criteria. If the efficiency of ABCD can be increased sufficiently, it will eventually become feasible to obtain designs for situations where we can assume that the true model is among a set of possible models but do not know which is the true model. Furthermore, we will exploit parallel computing techniques to a large extent to achieve substantial savings in computing time. In the course of this project, we will develop and implement methods and algorithms that will be thoroughly tested on several suitable examples and applications to assess their usefulness, accuracy, and efficiency. One particular application that we will consider is finding designs for collective cell spreading models, which help to understand wound healing and tumor growth.
Carefully planning the setting of the controllable factors of an experiment ahead of conducting the experiment can greatly improve the amount of information gained about the underlying process. The goal of the statistical analysis is encoded in a design criterion, which is then sought to be optimized with respect to the experimental design. Many statistical models are so complex, however, that the design criteria cannot be computed analytically. We have developed a simulation-based approach using machine learning methods to efficiently estimate these criteria. The optimal design can then be found by optimizing over those estimated criteria. Previous simulation-based approaches have required very large simulated samples compared to our new approach and were therefore not suitable for more complex design problems. We suggest to train a machine learning method on simulated data from the models in order to estimate the predictive functions for the parameters of these models given the model output. These predictive functions can be used to quickly estimate the expected information gain about the parameters at each design configuration. There are two requirements for our method to work well. First, it must be possible to obtain efficient simulations from the models under consideration. Second, the machine learning method applied must be fast yet sufficiently accurate in its predictive abilities. Furthermore, it should be relatively easy to handle. We show for some off-the-shelf machine learning techniques that this is the case for all the examples we consider. Our approach is therefore relatively easy to implement and can readily be applied by practitioners. Common statistical goals for experimental design are to efficiently estimate the parameters of a particular model or to be able to efficiently find out which of several candidate models is the most likely to have generated the observed data. Especially for the second goal, we have considered a variety of different examples. One very practical application is to determine the optimal observation times of a cell experiment, which is conducted to find out which of three possible models best explains the evolution of the number of bacteria within phagocytic cells. The choice of the model determines the mechansim responsible for the observed heterogeneity of the dynamics of bacterial reproduction in those cells. In another example, we seek to find the optimal observation times to optimally distinguish between different epidemiological models describing the number of infected individuals over time. In yet another example, we try to find the optimal locations in two-dimensional space for discriminating between processes governing the distribution of extreme observations. As one can see, our method can be employed to find the optimal experimental design for a wide variety of interesting models.
Research Output
- 34 Citations
- 6 Publications
- 1 Datasets & models
- 1 Disseminations
-
2022
Title A convex approach to optimum design of experiments with correlated observations DOI 10.1214/22-ejs2071 Type Journal Article Author Pázman A Journal Electronic Journal of Statistics Link Publication -
2019
Title Sequential Experimental Design for Predator-Prey Functional Response Experiments DOI 10.48550/arxiv.1907.02179 Type Preprint Author Moffat H -
2021
Title A convex approach to optimum design of experiments with correlated observations DOI 10.48550/arxiv.2103.02989 Type Preprint Author Pázman A -
2020
Title Sequential experimental design for predator–prey functional response experiments DOI 10.1098/rsif.2020.0156 Type Journal Article Author Moffat H Journal Journal of the Royal Society Interface Pages 20200156 Link Publication -
2022
Title Optimal Bayesian design for model discrimination via classification DOI 10.1007/s11222-022-10078-2 Type Journal Article Author Hainy M Journal Statistics and Computing Pages 25 Link Publication -
2018
Title ABC model selection for spatial extremes models applied to South Australian maximum temperature data DOI 10.1016/j.csda.2018.06.019 Type Journal Article Author Lee X Journal Computational Statistics & Data Analysis Pages 128-144 Link Publication