froll {data.table} | R Documentation |
Fast rolling functions to calculate aggregates on sliding windows. For user-defined rolling function see frollapply
.
frollmean(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"),
na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE)
frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"),
na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE)
frollmax(x, n, fill=NA, algo=c("fast","exact"), align=c("right","left","center"),
na.rm=FALSE, has.nf=NA, adaptive=FALSE, partial=FALSE, hasNA, give.names=FALSE)
x |
Vector, |
n |
Integer vector giving rolling window size(s). This is the total number of included values in aggregate function. Adaptive rolling functions also accept a list of integer vectors when applying multiple window sizes. |
fill |
Numeric; value to pad by. Defaults to |
algo |
Character, default |
align |
Character, specifying the "alignment" of the rolling window, defaulting to |
na.rm |
Logical, default |
has.nf |
Logical. If it is known that |
adaptive |
Logical, default |
partial |
Logical, default |
hasNA |
Logical. Deprecated, use |
give.names |
Logical, default |
froll*
functions accept vector, list, data.frame
or data.table
. Functions operate on a single vector, when passing a non-atomic input, then function is applied column-by-column, not to a complete set of column at once.
Argument n
allows multiple values to apply rolling function on multiple window sizes. If adaptive=TRUE
, then n
can be a list to specify multiple window sizes for adaptive rolling computation. See Adaptive rolling functions section below for details.
When multiple columns and/or multiple windows width are provided, then computation run in parallel. The exception is for algo="exact"
, which runs in parallel even for single column and single window width. By default data.table uses only half of available CPUs, see setDTthreads
for details on how to tune CPU usage.
Setting options(datatable.verbose=TRUE)
will display various information about how rolling function processed. It will not print information in a real-time but only at the end of the processing.
A list except when the input is not vectorized (x
is not a list, and n
specify single rolling window), in which case a vector
is returned, for convenience. Thus, rolling functions can be used conveniently within data.table
syntax.
has.nf
argumenthas.nf
can be used to speed up processing in cases when it is known if x
contains (or not) non-finite values (NA, NaN, Inf, -Inf
).
Default has.nf=NA
uses faster implementation that does not support non-finite values, but when non-finite values are detected it will re-run non-finite supported implementation.
has.nf=TRUE
uses non-finite aware implementation straightaway.
has.nf=FALSE
uses faster implementation that does not support non-finite values. Then depending on the rolling function it will either:
(mean, sum) detect non-finite, re-run non-finite aware.
(max) not detect NFs and may silently give incorrect answer.
In general has.nf=FALSE && any(!is.finite(x))
should be considered as undefined behavior. Therefore has.nf=FALSE
should be used with care.
Each rolling function has 4 different implementations. First factor that decides which implementation is being used is adaptive
argument, see setion below for details. Then for each of those two algorithms (adaptive TRUE/FALSE
) there are two algo
argument values.
algo="fast"
uses "on-line", single pass, algorithm.
max rolling function will not do only a single pass but, on average length(x)/n
, nested loops will be computed. The bigger the window the bigger advantage over algo exact which computes length(x)
nested loops. Note that exact uses multiple CPUs so for a small window size and many CPUs it is possible it will be actually faster than fast but in those cases elapsed timings will likely be far below a single second.
Not all functions have fast implementation available. As of now max and adaptive=TRUE
does not have, therefore it will automatically fall back to exact implementation. datatable.verbose
option can be used to check that.
algo="exact"
will make rolling functions to use a more computationally-intensive algorithm. For each observation from input vector it will compute a function on a window from scratch (complexity O(n^2)
).
Depeneding on the function, this algorithm may suffers less from floating point rounding error (the same consideration applies to base mean
).
In case of mean (and possibly other functions in future), it will additionally make extra pass to perform floating point error correction. Error corrections might not be truly exact on some platforms (like Windows) when using multiple threads.
Adaptive rolling functions are a special case where each observation has its own corresponding rolling window width. Therefore values passed to n
argument must be series corresponding to observations in x
. If multiple windows is meant to be computed then a list of integer vectors is expected; each list element must be an integer vector of window size corresponding to observations in x
; see Examples. Due to the logic or implementation of adaptive rolling functions, the following restrictions apply
align
does not support "center"
.
if list of vectors is passed to x
, then all vectors within it must have equal length due to the fact that length of adaptive window widths must match the length of vectors in x
.
partial
argumentpartial=TRUE
will turn a function into adaptive function and trim window size in n
argument using n = c(seq.int(n), rep(n, len-n))
to available observations. It inherits limitations of adaptive rolling functions, see above. Adaptive functions uses more complex algorithms, therefore if performance is important then partial=TRUE
should be avoided in favour of computing only missing observations separately after the rolling function; see examples.
zoo
package users noticeUsers coming from most popular package for rolling functions zoo
might expect following differences in data.table
implementation
rolling function will always return result of the same length as input.
fill
defaults to NA
.
fill
accepts only constant values. No support for na.locf or other functions.
align
defaults to "right"
.
na.rm
is respected, and other functions are not needed when input contains NA
.
integers and logical are always coerced to double.
when adaptive=FALSE
(default), then n
must be a numeric vector. List is not accepted.
when adaptive=TRUE
, then n
must be vector of length equal to nrow(x)
, or list of such vectors.
Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, then one has to ensure that there are no gaps in input or use adaptive rolling function to handle gaps by specifying expected window sizes. For details see issue #3241.
frollapply
, shift
, data.table
, setDTthreads
# single vector and single window
frollmean(1:6, 3)
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three calls above will use multiple cores when available
# frollsum
frollsum(d, 3:4)
# frollmax
frollmax(d, 3:4)
# partial=TRUE
x = 1:6/2
n = 3
ans1 = frollmean(x, n, partial=TRUE)
# same using adaptive=TRUE
an = function(n, len) c(seq.int(n), rep(n, len-n))
ans2 = frollmean(x, an(n, length(x)), adaptive=TRUE)
all.equal(ans1, ans2)
# much faster by using partial only for incomplete observations
ans3 = frollmean(x, n)
ans3[seq.int(n-1L)] = frollmean(x[seq.int(n-1L)], n, partial=TRUE)
all.equal(ans1, ans3)
# give.names
frollsum(list(x=1:5, y=5:1), c(tiny=2, big=4), give.names=TRUE)
# has.nf=FALSE should be used with care
frollmax(c(1,2,NA,4,5), 2)
frollmax(c(1,2,NA,4,5), 2, has.nf=FALSE)
# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
ans = rep(NA_real_, nx<-length(x))
for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
ans
}
fastma = function(x, n, na.rm) {
if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
cs = cumsum(x)
scs = shift(cs, n)
scs[n] = 0
as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
system.time(ans2<-fastma(x, n))
system.time(ans3<-frollmean(x, n))
system.time(ans4<-frollmean(x, n, algo="exact"))
system.time(ans5<-frollapply(x, n, mean, simplify=unlist))
anserr = list(
fastma = ans2-ans1,
froll_fast = ans3-ans1,
froll_exact = ans4-ans1,
frollapply = ans5-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff