Data assimilation
Data assimilation
Data assimilation is the process of state estimation by combining model forecasts and observations.
Related publications
- David Vishny, Matthias Morzfeld, Kyle Gwirtz, Eviatar Bach, Oliver R. A. Dunbar, and Daniel HodyssJournal of Advances in Modeling Earth Systems Aug 2024
We synthesize knowledge from numerical weather prediction, inverse theory, and statistics to address the problem of estimating a high-dimensional covariance matrix from a small number of samples. This problem is fundamental in statistics, machine learning/artificial intelligence, and in modern Earth science. We create several new adaptive methods for high-dimensional covariance estimation, but one method, which we call Noise-Informed Covariance Estimation (NICE), stands out because it has three important properties: (a) NICE is conceptually simple and computationally efficient; (b) NICE guarantees symmetric positive semi-definite covariance estimates; and (c) NICE is largely tuning-free. We illustrate the use of NICE on a large set of Earth science–inspired numerical examples, including cycling data assimilation, inversion of geophysical field data, and training of feed-forward neural networks with time-averaged data from a chaotic dynamical system. Our theory, heuristics and numerical tests suggest that NICE may indeed be a viable option for high-dimensional covariance estimation in many Earth science problems.
- Eviatar Bach*, Tim Colonius, Isabel Scherl, and Andrew StuartChaos: An Interdisciplinary Journal of Nonlinear Science Mar 2024
We consider the problem of filtering dynamical systems, possibly stochastic, using observations of statistics. Thus, the computational task is to estimate a time-evolving density ρ(v, t) given noisy observations of the true density ρ†; this contrasts with the standard filtering problem based on observations of the state v. The task is naturally formulated as an infinite-dimensional filtering problem in the space of densities ρ. However, for the purposes of tractability, we seek algorithms in state space; specifically, we introduce a mean-field state-space model, and using interacting particle system approximations to this model, we propose an ensemble method. We refer to the resulting methodology as the ensemble Fokker–Planck filter (EnFPF). Under certain restrictive assumptions, we show that the EnFPF approximates the Kalman–Bucy filter for the Fokker–Planck equation, which is the exact solution to the infinite-dimensional filtering problem. Furthermore, our numerical experiments show that the methodology is useful beyond this restrictive setting. Specifically, the experiments show that the EnFPF is able to correct ensemble statistics, to accelerate convergence to the invariant density for autonomous systems, and to accelerate convergence to time-dependent invariant densities for non-autonomous systems. We discuss possible applications of the EnFPF to climate ensembles and to turbulence modeling.
- Eviatar Bach*, and Michael GhilJournal of Advances in Modeling Earth Systems Jan 2023
Data assimilation (DA) aims to optimally combine model forecasts and observations that are both partial and noisy. Multi-model DA generalizes the variational or Bayesian formulation of the Kalman filter, and we prove that it is also the minimum variance linear unbiased estimator. Here, we formulate and implement a multi-model ensemble Kalman filter (MM-EnKF) based on this framework. The MM-EnKF can combine multiple model ensembles for both DA and forecasting in a flow-dependent manner; it uses adaptive model error estimation to provide matrix-valued weights for the separate models and the observations. We apply this methodology to various situations using the Lorenz96 model for illustration purposes. Our numerical experiments include multiple models with parametric error, different resolved scales, and different fidelities. The MM-EnKF results in significant error reductions compared to the best model, as well as to an unweighted multi-model ensemble, with respect to both probabilistic and deterministic error metrics.
- Ashesh Chattopadhyay, Ebrahim Nabizadeh, Eviatar Bach, and Pedram HassanzadehJournal of Computational Physics Mar 2023
Data assimilation (DA) is a key component of many forecasting models in science and engineering. DA allows one to estimate better initial conditions using an imperfect dynamical model of the system and noisy/sparse observations available from the system. Ensemble Kalman filter (EnKF) is a DA algorithm that is widely used in applications involving high-dimensional nonlinear dynamical systems. However, EnKF requires evolving large ensembles of forecasts using the dynamical model of the system. This often becomes computationally intractable, especially when the number of states of the system is very large, e.g., for weather prediction. With small ensembles, the estimated background error covariance matrix in the EnKF algorithm suffers from sampling error, leading to an erroneous estimate of the analysis state (initial condition for the next forecast cycle). In this work, we propose hybrid ensemble Kalman filter (H-EnKF), which is applied to a two-layer quasi-geostrophic turbulent flow as a test case. This framework utilizes a pre-trained deep learning-based data-driven surrogate that inexpensively generates and evolves a large data-driven ensemble of the states to accurately compute the background error covariance matrix with smaller sampling errors. The H-EnKF framework outperforms EnKF with only dynamical model or only the data-driven surrogate, and estimates a better initial condition without the need for any ad-hoc localization strategies. H-EnKF can be extended to any ensemble-based DA algorithm, e.g., particle filters, which are currently too expensive to use for high-dimensional systems.
- Ashesh Chattopadhyay, Mustafa Mustafa, Pedram Hassanzadeh, Eviatar Bach, and Karthik KashinathGeoscientific Model Development Mar 2022
There is growing interest in data-driven weather prediction (DDWP), e.g., using convolutional neural networks such as U-NET that are trained on data from models or reanalysis. Here, we propose three components, inspired by physics, to integrate with commonly used DDWP models in order to improve their forecast accuracy. These components are (1) a deep spatial transformer added to the latent space of U-NET to capture rotation and scaling transformation in the latent space for spatiotemporal data, (2) a data-assimilation (DA) algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and (3) a multi-time-step algorithm, which combines forecasts from DDWP models with different time steps through DA, improving the accuracy of forecasts at short intervals. To show the benefit and feasibility of each component, we use geopotential height at 500 hPa (Z500) from ERA5 reanalysis and examine the short-term forecast accuracy of specific setups of the DDWP framework. Results show that the spatial-transformer-based U-NET (U-STN) clearly outperforms the U-NET, e.g., improving the forecast skill by 45 %. Using a sigma-point ensemble Kalman (SPEnKF) algorithm for DA and U-STN as the forward model, we show that stable, accurate DA cycles are achieved even with high observation noise. This DDWP+DA framework substantially benefits from large (O(1000)) ensembles that are inexpensively generated with the data-driven forward model in each DA cycle. The multi-time-step DDWP+DA framework also shows promise; for example, it reduces the average error by factors of 2–3. These results show the benefits and feasibility of these three components, which are flexible and can be used in a variety of DDWP setups. Furthermore, while here we focus on weather forecasting, the three components can be readily adopted for other parts of the Earth system, such as ocean and land, for which there is a rapid growth of data and need for forecast and assimilation.
- Journal of Climate Jul 2021
Oscillatory modes of the climate system are among its most predictable features, especially at intraseasonal time scales. These oscillations can be predicted well with data-driven methods, often with better skill than dynamical models. However, since the oscillations only represent a portion of the total variance, a method for beneficially combining oscillation forecasts with dynamical forecasts of the full system was not previously known. We introduce Ensemble Oscillation Correction (EnOC), a general method to correct oscillatory modes in ensemble forecasts from dynamical models. We compute the ensemble mean—or the ensemble probability distribution—with only the best ensemble members, as determined by their discrepancy from a data-driven forecast of the oscillatory modes. We also present an alternate method that uses ensemble data assimilation to combine the oscillation forecasts with an ensemble of dynamical forecasts of the system (EnOC-DA). The oscillatory modes are extracted with a time series analysis method called multichannel singular spectrum analysis (M-SSA), and forecast using an analog method. We test these two methods using chaotic toy models with significant oscillatory components and show that they robustly reduce error compared to the uncorrected ensemble. We discuss the applications of this method to improve prediction of monsoons as well as other parts of the climate system. We also discuss possible extensions of the method to other data-driven forecasts, including machine learning.
- Stephen G. Penny, Eviatar Bach, Kriti Bhargava, Chu-Chun Chang, Cheng Da, Luyu Sun, and Takuma YoshidaJournal of Advances in Modeling Earth Systems Jun 2019
Strongly coupled data assimilation (SCDA) views the Earth as one unified system. This allows observations to have an instantaneous impact across boundaries such as the air-sea interface when estimating the state of each individual component. Operational prediction centers are moving toward Earth system modeling for all forecast timescales, ranging from days to months. However, there have been few studies that examine fundamental aspects of SCDA and the transition from traditional approaches that apply data assimilation only to a single component, whether forecasts were derived from a coupled model or an uncoupled forced model. The SCDA approach is examined here in detail using numerical experiments with a simple coupled atmosphere-ocean quasi-geostrophic model. The impact of coupling is explored with respect to its impact on the Lyapunov spectrum and on data assimilation system stability. Different data assimilation methods are compared within the context of SCDA, including the 3-D and 4-D Variational methods, the ensemble Kalman filter, and the hybrid gain method. The impact of observing system coverage is also investigated. We find that SCDA is generally superior to weakly coupled or uncoupled approaches. Dynamically defined background error covariance estimates are essential for SCDA to achieve an accurate coupled state estimate as the observing system becomes sparser. As a clarification of seemingly contradictory findings from previous studies, it is shown that ocean observations can adequately constrain atmospheric state estimates provided that the analysis-observing frequency is sufficiently high and the ensemble size determining the background error covariance is sufficiently large.