Title: | Univariate and Multivariate Damped Random Walk Processes |
---|---|
Description: | We provide a toolbox to fit and simulate a univariate or multivariate damped random walk process that is also known as an Ornstein-Uhlenbeck process or a continuous-time autoregressive model of the first order, i.e., CAR(1) or CARMA(1, 0). This process is suitable for analyzing univariate or multivariate time series data with irregularly-spaced observation times and heteroscedastic measurement errors. When it comes to the multivariate case, the number of data points (measurements/observations) available at each observation time does not need to be the same, and the length of each time series can vary. The number of time series data sets that can be modeled simultaneously is limited to ten in this version of the package. We use Kalman-filtering to evaluate the resulting likelihood function, which leads to a scalable and efficient computation in finding maximum likelihood estimates of the model parameters or in drawing their posterior samples. Please pay attention to loading the data if this package is used for astronomical data analyses; see the details in the manual. Also see Hu and Tak (2020) <arXiv:2005.08049>. |
Authors: | Zhirui Hu and Hyungsuk Tak |
Maintainer: | Hyungsuk Tak <[email protected]> |
License: | GPL-2 |
Version: | 1.0.2 |
Built: | 2025-01-24 02:37:47 UTC |
Source: | https://github.com/cran/Rdrw |
The function drw
fits univariate and multivariate damped random walk processes on multiple time series data sets possibly with known measurement error standard deviations via state-space representation. This function drw
evaluates the resulting likelihood function of the model parameters via Kalman-filtering whose minimum complexity is linear in the number of unique observation times. The function returns the maximum likelihood estimates or posterior samples of the model parameters. For astronomical data analyses, users need to pay attention to loading the data because R's default is to load only seven effective digits; see details below.
drw(data1, data2, data3, data4, data5, data6, data7, data8, data9, data10, n.datasets, method = "mle", bayes.n.burn, bayes.n.sample, mu.UNIFprior.range = c(-30, 30), tau.IGprior.shape = 1, tau.IGprior.scale = 1, sigma2.IGprior.shape = 1, sigma2.IGprior.scale = 1e-7)
drw(data1, data2, data3, data4, data5, data6, data7, data8, data9, data10, n.datasets, method = "mle", bayes.n.burn, bayes.n.sample, mu.UNIFprior.range = c(-30, 30), tau.IGprior.shape = 1, tau.IGprior.scale = 1, sigma2.IGprior.shape = 1, sigma2.IGprior.scale = 1e-7)
data1 |
An ( |
data2 |
Optional if more than one time series data set are available. An ( |
data3 |
Optional if more than two time series data sets are available. An ( |
data4 |
Optional if more than three time series data sets are available. An ( |
data5 |
Optional if more than four time series data sets are available. An ( |
data6 |
Optional if more than five time series data sets are available. An ( |
data7 |
Optional if more than six time series data sets are available. An ( |
data8 |
Optional if more than seven time series data sets are available. An ( |
data9 |
Optional if more than eight time series data sets are available. An ( |
data10 |
Optional if more than nine time series data sets are available. An ( |
n.datasets |
The number of time series data sets to be modeled simultaneously. An integer value inclusively between 1 to 10. For example, if " |
method |
If |
bayes.n.burn |
Required for |
bayes.n.sample |
Required for |
mu.UNIFprior.range |
Required for |
tau.IGprior.shape |
Required for |
tau.IGprior.scale |
Required for |
sigma2.IGprior.shape |
Required for |
sigma2.IGprior.scale |
Required for |
The multivariate damped random walk process is defined by the following stochastic differential equation:
where is a vector of
measurements/observations/magnitudes of the
time series data sets in continuous time
,
is a
diagonal matrix whose diagonal elements are
timescales with each
representing the timescale of the
-th time series data,
is a vector for long-term averages of the
time series data sets,
is
diagonal matrix whose diagonal elements are short-term variabilities (standard deviation) of
time series data sets, and finally
is a vector for
standard Brownian motions whose
pairwise correlations are modeled by correlation parameters
such that
.
We evaluate this continuous-time process at discrete observation times
. The observed data
are multiple time series data measured at irregularly spaced observation times
with possibly known measurement error standard deviations,
. Since one or more time series observations can be measured at each observation time
, the length of a vector
can be different, depending on how many time series observations are available at the
-th observation time. We assume that these observed data
are realizations of the latent time series data sets
with Gaussian measurement errors whose variances are
. This is a typical setting of state-space modeling. We note that if the measurement error variances are unknown,
must be set to zeros, which means that the observed data directly measure the latent values.
Please note that when astronomical time series data are loaded on R by read.table
, read.csv
, etc., some decimal places of the the observation times are automatically rounded because R's default is to load seven effective digits. For example, R will load the observation time 51075.412789 as 51075.41. This default will produce many ties in observation times even though there is actually no tie in observation times. To prevent this, please type "options(digits = 11)
" before loading the data if the observation times are in seven effective digits.
The outcomes of drw
are composed of:
The maximum likelihood estimate(s) of the long-term average(s) if method
is "mle", and the posterior sample(s) of the long-term average(s) if method
is "bayes". In the former case (mle), it is a vector of length , where
is the number of time series data sets used. In the later case (bayes), it is an (
by
) matrix where
is the size of the posterior sample.
The maximum likelihood estimate(s) of the short-term variability (standard deviation) parameter(s) if method
is "mle", and the posterior sample(s) of the short-term variability parameter(s) if method
is "bayes". In the former case (mle), it is a vector of length , where
is the number of time series data sets used. In the later case (bayes), it is an (
by
) matrix where
is the size of the posterior sample.
The maximum likelihood estimate(s) of the timescale(s) if method
is "mle", and the posterior sample(s) of the timescale(s) if method
is "bayes". In the former case (mle), it is a vector of length , where
is the number of time series data sets used. In the later case (bayes), it is an (
by
) matrix where
is the size of the posterior sample.
Only when more than one time series data set are used. The maximum likelihood estimate(s) of the (cross-) correlation(s) if method
is "mle", and the posterior sample(s) of the (cross-) correlation(s) if method
is "bayes". In the former case (mle), it is a vector of length , where
is the number of time series data sets used, i.e.,
. In the later case (bayes), it is an (
by
) matrix where
is the size of the posterior sample.
Only when method
is "bayes". The MCMC acceptance rate(s) of the long-term average parameter(s).
Only when method
is "bayes". The MCMC acceptance rate(s) of the short-term variability parameter(s).
Only when method
is "bayes". The MCMC acceptance rate(s) of the timescale(s).
Only when more than one time series data set are used with method = "bayes"
. The MCMC acceptance rate(s) of the (cross-) correlation(s).
The combined data set if more than one time series data set are used, and data1 if only one time series data set is used. This output is only available when method
is set to "bayes".
Zhirui Hu and Hyungsuk Tak
Zhirui Hu and Hyungsuk Tak (2020+), "Modeling Stochastic Variability in Multi-Band Time Series Data," arXiv:2005.08049.
########## Fitting a univariate damped random walk process ##### Fitting a univariate damped random walk process based on a simulation n1 <- 20 # the number of observations in the data set obs.time1 <- cumsum(rgamma(n1, shape = 3, rate = 1)) # the irregularly-spaced observation times y1 <- rnorm(n1) # the measurements/observations/magnitudes measure.error.SD1 <- rgamma(n1, shape = 0.01) # optional measurement error standard deviations, # which is typically known in astronomical time series data # if not known in other applications, set them to zeros, i.e., # measure.error.SD1 <- rep(0, n1) data1 <- cbind(obs.time1, y1, measure.error.SD1) # combine the single time series data set into an n by 3 matrix # Note that when astronomical time series data are loaded on R (e.g., read.table, read.csv), # the digits of the observation times are typically rounded to seven effective digits. # That means rounding may occur, which produces ties in observation times even though # the original observation times are not the same. # In this case, type the following code before loading the data. # options(digits = 11) res1.mle <- drw(data1 = data1, n.datasets = 1, method = "mle") # obtain maximum likelihood estimates of the model parameters and # assign the result to object "res1.mle" names(res1.mle) # to see the maximum likelihood estimates, # type "res1.mle$mu", "res1.mle$sigma", "res1.mle$tau" res1.bayes <- drw(data1 = data1, n.datasets = 1, method = "bayes", bayes.n.burn = 10, bayes.n.sample = 10) # obtain 10 posterior samples of each model parameter and # save the result to object "res1.bayes" # names(res1.bayes) # to work on the posterior sample of each parameter, try # "res1.bayes$mu.accept.rate", "res1.bayes$sigma.accept.rate", "res1.bayes$tau.accept.rate" # "hist(res1.bayes$mu)", "mean(res1.bayes$mu)", "sd(res1.bayes$mu)", # "median(log(res1.bayes$sigma, base = 10))", # "quantile(log(res1.bayes$tau, base = 10), prob = c(0.025, 0.975))" ##### Fitting a multivariate damped random walk process based on simulations n2 <- 10 # the number of observations in the second data set obs.time2 <- cumsum(rgamma(n2, shape = 3, rate = 1)) # the irregularly-spaced observation times of the second data set y2 <- rnorm(n2) # the measurements/observations/magnitudes of the second data set measure.error.SD2 <- rgamma(n2, shape = 0.01) # optional measurement error standard deviations of the second data set, # which is typically known in astronomical time series data # if not known in other applications, set them to zeros, i.e., # measure.error.SD2 <- rep(0, n2) data2 <- cbind(obs.time2, y2, measure.error.SD2) # combine the single time series data set into an n by 3 matrix res2.mle <- drw(data1 = data1, data2 = data2, n.datasets = 2, method = "mle") # obtain maximum likelihood estimates of the model parameters and # assign the result to object "res2.mle" res2.bayes <- drw(data1 = data1, data2 = data2, n.datasets = 2, method = "bayes", bayes.n.burn = 10, bayes.n.sample = 10) # obtain 10 posterior samples of each model parameter and # save the result to object "res2.bayes" # names(res2.bayes) # to work on the posterior sample of each parameter, try # "hist(res2.bayes$mu[, 1])", "colMeans(res2.bayes$mu)", "apply(res2.bayes$mu, 2, sd)", # "hist(log(res2.bayes$sigma[, 2], base = 10))", # "apply(log(res2.bayes$sigma, base = 10), 2, median)", # "apply(log(res2.bayes$tau, base = 10), 2, quantile, prob = c(0.025, 0.975))"
########## Fitting a univariate damped random walk process ##### Fitting a univariate damped random walk process based on a simulation n1 <- 20 # the number of observations in the data set obs.time1 <- cumsum(rgamma(n1, shape = 3, rate = 1)) # the irregularly-spaced observation times y1 <- rnorm(n1) # the measurements/observations/magnitudes measure.error.SD1 <- rgamma(n1, shape = 0.01) # optional measurement error standard deviations, # which is typically known in astronomical time series data # if not known in other applications, set them to zeros, i.e., # measure.error.SD1 <- rep(0, n1) data1 <- cbind(obs.time1, y1, measure.error.SD1) # combine the single time series data set into an n by 3 matrix # Note that when astronomical time series data are loaded on R (e.g., read.table, read.csv), # the digits of the observation times are typically rounded to seven effective digits. # That means rounding may occur, which produces ties in observation times even though # the original observation times are not the same. # In this case, type the following code before loading the data. # options(digits = 11) res1.mle <- drw(data1 = data1, n.datasets = 1, method = "mle") # obtain maximum likelihood estimates of the model parameters and # assign the result to object "res1.mle" names(res1.mle) # to see the maximum likelihood estimates, # type "res1.mle$mu", "res1.mle$sigma", "res1.mle$tau" res1.bayes <- drw(data1 = data1, n.datasets = 1, method = "bayes", bayes.n.burn = 10, bayes.n.sample = 10) # obtain 10 posterior samples of each model parameter and # save the result to object "res1.bayes" # names(res1.bayes) # to work on the posterior sample of each parameter, try # "res1.bayes$mu.accept.rate", "res1.bayes$sigma.accept.rate", "res1.bayes$tau.accept.rate" # "hist(res1.bayes$mu)", "mean(res1.bayes$mu)", "sd(res1.bayes$mu)", # "median(log(res1.bayes$sigma, base = 10))", # "quantile(log(res1.bayes$tau, base = 10), prob = c(0.025, 0.975))" ##### Fitting a multivariate damped random walk process based on simulations n2 <- 10 # the number of observations in the second data set obs.time2 <- cumsum(rgamma(n2, shape = 3, rate = 1)) # the irregularly-spaced observation times of the second data set y2 <- rnorm(n2) # the measurements/observations/magnitudes of the second data set measure.error.SD2 <- rgamma(n2, shape = 0.01) # optional measurement error standard deviations of the second data set, # which is typically known in astronomical time series data # if not known in other applications, set them to zeros, i.e., # measure.error.SD2 <- rep(0, n2) data2 <- cbind(obs.time2, y2, measure.error.SD2) # combine the single time series data set into an n by 3 matrix res2.mle <- drw(data1 = data1, data2 = data2, n.datasets = 2, method = "mle") # obtain maximum likelihood estimates of the model parameters and # assign the result to object "res2.mle" res2.bayes <- drw(data1 = data1, data2 = data2, n.datasets = 2, method = "bayes", bayes.n.burn = 10, bayes.n.sample = 10) # obtain 10 posterior samples of each model parameter and # save the result to object "res2.bayes" # names(res2.bayes) # to work on the posterior sample of each parameter, try # "hist(res2.bayes$mu[, 1])", "colMeans(res2.bayes$mu)", "apply(res2.bayes$mu, 2, sd)", # "hist(log(res2.bayes$sigma[, 2], base = 10))", # "apply(log(res2.bayes$sigma, base = 10), 2, median)", # "apply(log(res2.bayes$tau, base = 10), 2, quantile, prob = c(0.025, 0.975))"
The function drw.sim
simulates time series data set(s) following either univariate or multivariate damped random walk process.
drw.sim(time, n.datasets, measure.error.SD, mu, sigma, tau, rho)
drw.sim(time, n.datasets, measure.error.SD, mu, sigma, tau, rho)
time |
A vector containing observation times. Let us use |
n.datasets |
Any positive integer value that denotes the number of time series data sets to be simulated. In simulation, there is no upper limit in the number of time series data sets. Let's use |
measure.error.SD |
Optional if measurement error standard deviations are known and available. If one time series data set is simulated, it is a vector of length |
mu |
A vector of length |
sigma |
A vector of length |
tau |
A vector of length |
rho |
Required if more than one time series data sets are simulated |
Given the observation times and model parameter values (mu, sigma, tau, rho) possibly with known measurement error standard deviations, this function simulates
time series data sets.
The outcome of drw.sim
is composed of:
An by
matrix composed of
simulated time series data each with length
. That is, each column is corresponding to one simulated time series data set.
Zhirui Hu and Hyungsuk Tak
Zhirui Hu and Hyungsuk Tak (2020+), "Modeling Stochastic Variability in Multi-Band Time Series Data," arXiv:2005.08049.
########## Simulating a multivariate damped random walk process n <- 100 k <- 5 obs.time <- cumsum(rgamma(n, shape = 3, rate = 1)) tau <- 100 + 20 * (1 : 5) #rnorm(k, 0, 5) sigma <- 0.01 * (1 : 5) #tau <- c(1 : 5) #rnorm(k, 0, 5) #sigma <- 0.05 + 0.007 * (0 : 4) #rnorm(k, 0, 0.002) mu <- 17 + 0.5 * (1 : 5) rho.m <- matrix(0, k, k) for(i in 1 : k) { for(j in 1 : k) { rho.m[i, j] = 1.1^(-abs(i - j)) } } rho <- rho.m[upper.tri(rho.m)] measure.error.band <- c(0.010, 0.014, 0.018, 0.022, 0.026) measure.error <- NULL for(i in 1 : k) { measure.error <- cbind(measure.error, rnorm(n, measure.error.band[i], 0.002)) } x <- drw.sim(time = obs.time, n.datasets = 5, measure.error.SD = measure.error, mu = mu, sigma = sigma, tau = tau, rho = rho) plot(obs.time, x[, 1], xlim = c(min(obs.time), max(obs.time)), ylim = c(17, 20), xlab = "time", ylab = "observation") points(obs.time, x[, 2], col = 2, pch = 2) points(obs.time, x[, 3], col = 3, pch = 3) points(obs.time, x[, 4], col = 4, pch = 4) points(obs.time, x[, 5], col = 5, pch = 5) ########## Simulating a univariate damped random walk process x <- drw.sim(time = obs.time, n.datasets = 1, measure.error.SD = measure.error[, 1], mu = mu[1], sigma = sigma[1], tau = tau[1]) plot(obs.time, x)
########## Simulating a multivariate damped random walk process n <- 100 k <- 5 obs.time <- cumsum(rgamma(n, shape = 3, rate = 1)) tau <- 100 + 20 * (1 : 5) #rnorm(k, 0, 5) sigma <- 0.01 * (1 : 5) #tau <- c(1 : 5) #rnorm(k, 0, 5) #sigma <- 0.05 + 0.007 * (0 : 4) #rnorm(k, 0, 0.002) mu <- 17 + 0.5 * (1 : 5) rho.m <- matrix(0, k, k) for(i in 1 : k) { for(j in 1 : k) { rho.m[i, j] = 1.1^(-abs(i - j)) } } rho <- rho.m[upper.tri(rho.m)] measure.error.band <- c(0.010, 0.014, 0.018, 0.022, 0.026) measure.error <- NULL for(i in 1 : k) { measure.error <- cbind(measure.error, rnorm(n, measure.error.band[i], 0.002)) } x <- drw.sim(time = obs.time, n.datasets = 5, measure.error.SD = measure.error, mu = mu, sigma = sigma, tau = tau, rho = rho) plot(obs.time, x[, 1], xlim = c(min(obs.time), max(obs.time)), ylim = c(17, 20), xlab = "time", ylab = "observation") points(obs.time, x[, 2], col = 2, pch = 2) points(obs.time, x[, 3], col = 3, pch = 3) points(obs.time, x[, 4], col = 4, pch = 4) points(obs.time, x[, 5], col = 5, pch = 5) ########## Simulating a univariate damped random walk process x <- drw.sim(time = obs.time, n.datasets = 1, measure.error.SD = measure.error[, 1], mu = mu[1], sigma = sigma[1], tau = tau[1]) plot(obs.time, x)
The R package Rdrw provides a toolbox to fit and simulate univariate and multivariate damped random walk processes, possibly with known measurement error standard deviations via state-space representation. The damped random walk process is also known as an Ornstein-Uhlenbeck process or a continuous-time auto-regressive model with order one, i.e., CAR(1) or CARMA(1, 0). The package Rdrw adopts Kalman-filtering to evaluate the resulting likelihood function of the model parameters, leading to a linear complexity in the number of unique observation times. The package provides two functionalities; (i) it fits the model and returns the maximum likelihood estimates or posterior samples of the model parameters; (ii) it simulates time series data following the univariate or multivariate damped random walk process.
Package: | Rdrw |
Type: | Package |
Version: | 1.0.2 |
Date: | 2020-9-8 |
License: | GPL-2 |
Main functions: | drw , drw.sim
|
Zhirui Hu and Hyungsuk Tak
Zhirui Hu and Hyungsuk Tak (2020+), "Modeling Stochastic Variability in Multi-Band Time Series Data," arXiv:2005.08049.