froll {data.table}R Documentation

Rolling functions

Description

Fast rolling functions to calculate aggregates on sliding windows. For user-defined rolling function see frollapply.

Usage

  frollmean(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"),
    na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE)
  frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"),
    na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE)
  frollmax(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"),
    na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE)

Arguments

x

Vector, data.frame or data.table of integer, numeric or logical columns over which to calculate the windowed aggregations. May also be a list, in which case the rolling function is applied to each of its elements.

n

Integer vector giving rolling window size(s). This is the total number of included values in aggregate function. Adaptive rolling functions also accept a list of integer vectors when applying multiple window sizes.

fill

Numeric; value to pad by. Defaults to NA.

algo

Character, default "fast". When set to "exact", a slower (in some cases more accurate) algorithm is used. See Implementation section below for details.

align

Character, specifying the "alignment" of the rolling window, defaulting to "right". "right" covers preceding rows (the window ends on the current value); "left" covers following rows (the window starts on the current value); "center" is halfway in between (the window is centered on the current value, biased towards "left" when n is even).

na.rm

Logical, default FALSE. Should missing values be removed when calculating window?

has.nf

Logical. If it is known that x contains (or not) non-finite values (NA, NaN, Inf, -Inf) then setting this to TRUE/FALSE may speed up computation. Defaults to NA. See has.nf argument section below for details.

adaptive

Logical, default FALSE. Should the rolling function be calculated adaptively? See Adaptive rolling functions section below for details.

partial

Logical, default FALSE. Should the rolling window size(s) provided in n be trimmed to available observations. See partial argument section below for details.

hasNA

Logical. Deprecated, use has.nf argument instead.

give.names

Logical, default FALSE. When TRUE, names are automatically generated corresponding to names of x and names of n. If answer is an atomic vector, then the argument is ignored, see examples.

Details

froll* functions accept vector, list, data.frame or data.table. Functions operate on a single vector, when passing a non-atomic input, then function is applied column-by-column, not to a complete set of column at once.

Argument n allows multiple values to apply rolling function on multiple window sizes. If adaptive=TRUE, then n can be a list to specify multiple window sizes for adaptive rolling computation. See Adaptive rolling functions section below for details.

When multiple columns and/or multiple windows width are provided, then computation run in parallel. The exception is for algo="exact", which runs in parallel even for single column and single window width. By default data.table uses only half of available CPUs, see setDTthreads for details on how to tune CPU usage.

Setting options(datatable.verbose=TRUE) will display various information about how rolling function processed. It will not print information in a real-time but only at the end of the processing.

Value

A list except when the input is not vectorized (x is not a list, and n specify single rolling window), in which case a vector is returned, for convenience. Thus, rolling functions can be used conveniently within data.table syntax.

has.nf argument

has.nf can be used to speed up processing in cases when it is known if x contains (or not) non-finite values (NA, NaN, Inf, -Inf).

Implementation

Each rolling function has 4 different implementations. First factor that decides which implementation is being used is adaptive argument, see setion below for details. Then for each of those two algorithms (adaptive TRUE/FALSE) there are two algo argument values.

Adaptive rolling functions

Adaptive rolling functions are a special case where each observation has its own corresponding rolling window width. Therefore values passed to n argument must be series corresponding to observations in x. If multiple windows is meant to be computed then a list of integer vectors is expected; each list element must be an integer vector of window size corresponding to observations in x; see Examples. Due to the logic or implementation of adaptive rolling functions, the following restrictions apply

partial argument

partial=TRUE will turn a function into adaptive function and trim window size in n argument using n = c(seq.int(n), rep(n, len-n)) to available observations. It inherits limitations of adaptive rolling functions, see above. Adaptive functions uses more complex algorithms, therefore if performance is important then partial=TRUE should be avoided in favour of computing only missing observations separately after the rolling function; see examples.

zoo package users notice

Users coming from most popular package for rolling functions zoo might expect following differences in data.table implementation

Note

Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, then one has to ensure that there are no gaps in input or use adaptive rolling function to handle gaps by specifying expected window sizes. For details see issue #3241.

References

Round-off error

See Also

frollapply, shift, data.table, setDTthreads

Examples

# single vector and single window
frollmean(1:6, 3)

d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three calls above will use multiple cores when available

# frollsum
frollsum(d, 3:4)

# frollmax
frollmax(d, 3:4)

# partial=TRUE
x = 1:6/2
n = 3
ans1 = frollmean(x, n, partial=TRUE)
# same using adaptive=TRUE
an = function(n, len) c(seq.int(n), rep(n, len-n))
ans2 = frollmean(x, an(n, length(x)), adaptive=TRUE)
all.equal(ans1, ans2)
# much faster by using partial only for incomplete observations
ans3 = frollmean(x, n)
ans3[seq.int(n-1L)] = frollmean(x[seq.int(n-1L)], n, partial=TRUE)
all.equal(ans1, ans3)

# give.names
frollsum(list(x=1:5, y=5:1), c(tiny=2, big=4), give.names=TRUE)

# has.nf=FALSE should be used with care
frollmax(c(1,2,NA,4,5), 2)
frollmax(c(1,2,NA,4,5), 2, has.nf=FALSE)

# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
  ans = rep(NA_real_, nx<-length(x))
  for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
  ans
}
fastma = function(x, n, na.rm) {
  if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
  cs = cumsum(x)
  scs = shift(cs, n)
  scs[n] = 0
  as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
system.time(ans2<-fastma(x, n))
system.time(ans3<-frollmean(x, n))
system.time(ans4<-frollmean(x, n, algo="exact"))
system.time(ans5<-frollapply(x, n, mean, simplify=unlist))
anserr = list(
  fastma = ans2-ans1,
  froll_fast = ans3-ans1,
  froll_exact = ans4-ans1,
  frollapply = ans5-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff

[Package data.table version 1.14.3 Index]