Dependencies are a way to outsource computation that your software needs to perform to another piece of software.
Dependencies are generally classified in two types:
direct - dependencies defined in your software
transitive (aka recursive, nested, indirect) - dependencies of a dependency itself
not defined in your software,
defined in software that your software depends on, or…
defined in software that depends on the software that depends … that your software depends on.
Types of dependencies in R
Operating System level dependencies
installed from OS
OS specific!
R level dependencies
R packages
same across OSes
In order to install curl in R we need to first install required OS package:
## Debian / Ubuntu
sudo apt install libcurl4-openssl-dev
## Fedora / Red Hat Enterprise
sudo dnf install libcurl-devel
And then in R
install.packages("curl")
For curl package it was “simple”,
so let’s try…
pkgdown, CRAN’s pkgdown site says:
SystemRequirements: pandoc
So we proceed with specified requirement…
sudo apt install pandoc
And we could expect that is enough and we can now install pkgdown
install.packages("pkgdown")
#...
#ERROR: dependencies httr2, openssl, ragg, xml2
# are not available for package pkgdown
What happened?
CRAN mentions only OS dependencies of pkgdown
but not OS dependencies of its 58 (incl recursive) R dependencies.
In fact, we also need to install 10 other OS packages
sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev
libfontconfig1-dev libharfbuzz-dev libfribidi-dev
libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev
Actually it installs many more than 10 packages
as some of those have recursive dependencies as well!
In fact… actually…
We can see how tedious process it is to setup an enviroment.
And we usually need three: dev, test, prod.
Those who experienced that might have felt they entered "Dependency hell",
but this is just the tip of the iceberg!
There are projects addressing the problem by providing R CRAN packages as OS level software to install, so OS dependencies can be resolved automatically: ubuntu: r2u, fedora: cran2copr, …
There exists at least multiple 3rd party tools to manage dependencies, each in its own way.
It is also important to weight their use carefully, as by using those we are
addressing some of the problems by introducing another (a deployment-time) dependency.
And that new dependency is no different in terms of risks discussed before:
trust, ability to fix, breaking changes (recursively!).
Within R packages there are different dependency relations that can be defined.
Let’s briefly remind them here.
In general “Depends” should be avoided in favor of “Imports” as the former pollutes search path of users of your package.
ap = available.packages(); tpd = tools::package_dependencies
pd = tpd(rownames(ap), ap, recursive=FALSE, which="strong")
summary(lengths(pd)) ## --- direct dependencies ---
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.000 2.000 4.000 5.546 8.000 57.000
pd[order(lengths(pd))] |> tail(n=3) |> lengths()
#> TOmicsVis immunarch Seurat
#> 43 45 57
pr = tpd(rownames(ap), ap, recursive=TRUE, which="strong")
summary(lengths(pr)) ## --- all dependencies ---
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.00 5.00 20.00 32.95 49.00 260.00
pr[order(lengths(pr))] |> tail(n=3) |> lengths()
#> wallace BioM2 TestAnaAPP
#> 240 252 260
your_pkg
depends on your_dependency
and then your_dependency (which is outside of your quality control)
adds single line to DESCRIPTION file:
Additional_repositories: https://r-repo.evil
Then dependencies of your_dependency will be resolved from a malicious server.
As smoothly as from CRAN.
And that is already enough.
It happens that your dependency might not be actively maintained anymore,
or its maintainer rejects modification/fix you proposed.
Then you are left with
Avoid writing production code that depends on pre-1.0.0 versions of your dependencies.
Prefer software that puts effort in backward compatibility.
Submit your usage patterns as new unit tests to the upstream.
Packages that are pre-1.0.0 could be considered not stable, where exported functions or arguments are still very likely to change.
More dependencies in majority of cases implies:
Often a package might as well be slower than its corresponding base R implementations,
leading to higher computation resources usage, thus higher operational costs.
C code compiles significantly faster than C++,
and does not require C++ compiler! (heavy OS dependency)
There are many situations where avoiding dependencies may not be really important:
For a package being a shiny app, or a markdown report, having colored console output will not be of much use.
Let’s say you need to bind list of data.frames
l = list(data.frame(a = 1, b = 2),
data.frame(a = 4, b = 3))
my_rbindlist = function(l) do.call("rbind", l)
all.equal(
my_rbindlist(l),
data.table::rbindlist(l) |> as.data.frame()
)
#> [1] TRUE
Although base R will be slower and more memory demanding, and not as feature rich.
Choose vignettes rendering engine that is sufficient for your needs.
A slide from Yihui’s (author of knitr blogdown bookdown and many others) presentation:
Since then Yihui developed even more lightweight alternative to rmarkdown and markdown: litedown
rmarkdown
C, C++, pandoc 173MB, 26 Rpkgs 47MB
markdown
C, 3 Rpkgs 2.4MB - can render Rmd but not vignettes
still needs knitr to render vignettes, totalling 7 Rpkgs 5.6 MB
litedown
C, 3 Rpkgs 2.7MB
no need knitr to render vignettes
Similarly to vignettes engine,
we have some choice when it comes to testing your package.
The most popular ones are testthat
and tinytest
.
str(tools::package_dependencies(
c("testthat","tinytest"),
which="strong", recursive=TRUE
))
#List of 2
# $ testthat: chr [1:36] "brio" "callr" "cli" "desc" ...
# $ tinytest: chr [1:2] "parallel" "utils"
It’s good, it’s simple, it’s easy to use, it’s lightweight.
But still base R has more than enough what is necessary for package testing.
In case licenses (of your package and your dependency) are not exactly the same this will add licensing burden to your project.
That implies following changes:
In case if licenses are not well compatible it may eventually require to change the license of your project!
Moreover when deciding to copy the code, then all the code should be well understood, as you are the new maintainers of the local copy.
A package that depeneds on everything in JavaScript npm
ecosystem:
Denial of Service (DOS) for anyone who installs it
locked down the ability for authors to unpublish their packages
In R when we submit package to CRAN then it undergoes a review.
But what about code review when submitting updates to package that is already on CRAN?
Other repositories? r-universe.dev
Another vector of attack, repo account hijacked
crypto-mining and password-stealing malware embedded in “UAParser.js,”
a popular JavaScript NPM library with over 6 million weekly downloads
UAParser.js’s developer Faisal Salman:
I believe someone was hijacking my NPM account and published some compromised packages (0.7.29, 0.8.0, 1.0.0) which will probably install malware
However I’m sure it’s a dependency or a dependency of a dependency (of a dependency) of a number of packages being used in production by plenty of applications.
Before
Depends:
R (>= 2.15.0),
methods
After, 25th Feb 2015, DBI 1.2.3.9013: Integrate SQL package into DBI
Depends:
R (>= 2.15.0),
methods
LinkingTo: Rcpp
Imports:
Rcpp
…new heavy dependency which I’m currently not using in multiple production environments …chance to move Rcpp to Suggests or make a non-Rcpp branch?
…most people will have any way (since rcpp is most downloaded package)
Hannes: DBI depending on Rcpp? #82
…Rcpp is a huge dependency …Rcpp is only used to parse SQL strings. IMHO, this is also not really a job for DBI to semantically analyze the query strings.
Hannes: R implementation of sqlParseVariablesImpl #83
Fixes #82.
Hannes Mühleisen - creator of duckdb
RSQLite v1.0.0 (2014)
Depends:
R (>= 2.10.0),
DBI (>= 0.3.1),
methods
RSQLite v1.1.0 (2016)
Depends:
R (>= 3.1.0)
Imports:
DBI (>= 0.4-9),
memoise, ## new
methods,
Rcpp (>= 0.12.7) ## new
LinkingTo: Rcpp, BH, plogr ## new
RSQLite (2024)
Depends:
R (>= 3.1.0)
Imports:
bit64, ## new
blob (>= 1.2.0), ## new
DBI (>= 1.2.0),
memoise,
methods,
pkgconfig, ## new
rlang ## new
LinkingTo:
plogr (>= 0.2.0), ## new
cpp11 (>= 0.4.0) ## new
is it still SQLite?
Fork is made on RSQLite v1.0.0 (2014-10-25), at the time when package was lightweight. No new features are being added here, it works good enough, and light enough. SQLite has been upgraded to 3.31.1 (2020-01-27).
Reason for increasing the required R version to >= 4.0? evaluate#173
…I see that the required R version has been bumped from 3.0.2 to 4.0
@etiennebacher (R polars maintainer)
…unlikely to have much impact on testthat, since it already depends on brio, fs, glue, lifecycle and waldo which all depend on R 3.6.0 (and will be bumped to 4.0.0
@hadley
Couple other projects affected by that changed joined to this reported issue.
str(tools::package_dependencies(
c("RPostgreSQL","RPostgres"),
which="strong", recursive=TRUE
))
#List of 2
# $ RPostgreSQL: chr [1:2] "methods" "DBI"
# $ RPostgres : chr [1:22] "bit64" "blob" "DBI" "hms" ...
less is more – tiny versus tidy by @eddelbuettel
While there is (considerable) variability (likely stemming from heterogenous setups at GitHub Action) the tiny approach is on average about twice as fast as the tidy approch
Many other that I was not personally involved into.
Having an identifiable set of package dependencies at any point in time is
a beginning. Its difficult to effectively control developer behaviour, so
there is a risk there, but what makes it into production can in principle
be identified and controlled.
Mirror all R packages that your project(s) required, including recursive dependencies.
tools::write_PACKAGES()
drat::insertPackages()
tools4pkgs::mirror.packages()
Essential for Test and Prod deployments.
We never want to deploy Prod using the most up-to-date dependencies, which were not yet tested against our code.
We need exact set of packages that is shipped to Test to be also deployed to Prod.
Base R tools::package_dependencies
is your friend.
str(tools::package_dependencies(
c("curl","jsonlite","data.table","duckdb"),
which="strong", recursive=TRUE
))
#List of 4
# $ curl : chr(0)
# $ jsonlite : chr "methods"
# $ data.table: chr "methods"
# $ duckdb : chr [1:3] "DBI" "methods" "utils"
Suggested dependencies are not mandatory for your package installation.
They are completely optional until user will not reach the functionality within the package that needs to use a suggested dependency.
In such cases we escape every call to functionality in a suggested dependency by raising meaningful error if package is not installed.
if (requireNamespace("pkg", quietly=TRUE)) {
pkg::fun()
} else {
stop("'pkg' is required to run this functionality, retry after:\n",
"install.packages('pkg')\n")
}
WRE (Writing R Extensions), the bible for R packages developers, says:
Note that the recommendation to use suggested packages conditionally in tests does also apply to packages used to manage test suites: a notorious example was testthat which in version 1.0.0 contained illegal C++ code and hence could not be installed on standards-compliant platforms.
The same as every other suggested dependency! File tests/testthat.R
:
if (requireNamespace("testthat", quietly=TRUE)) {
library("testthat")
test_check("your.pkg")
} else cat("Package tests have been skipped\n")
Use base R for unit test
./tests/
directory of your packageexpect_identical <- function(x, y) {
stopifnot(identical(x, y))
}
expect_equal <- function(x, y, ...) {
stopifnot(all.equal(x, y, ...))
}
You may find it useful to set _R_CHECK_NO_STOP_ON_TEST_ERROR_=true
or run R CMD check with --no-stop-on-test-error
flag.
markdown is like rmarkdown but has less features.
You just need to check if you use any of those extra features.
So, we moved 10 vignettes from engine rmarkdown to engine markdown…
data.table: vignette render with markdown rather than rmarkdown #5773
So saving 12 minutes on each test job means saving a lot of CI compute minutes. We have 8 test jobs currently, and likely we will have more. There is also build job which needs to install those. So savings of CI compute minutes are more than 100 min on a single pipeline. Aside from time we can also use lighter image for build (no need for C++ toolchain).
From Yihui Xie (author of knitr bookdown blogdown) developed this year.
Avoid using tidyverse as a dependency of your package.
Depends: tidyverse
Instead list explicitly tidyverse packages that you directly depend on.
Imports: dplyr, ggplot2
Same goes for others meta packages (mlr3verse, etc.).
It turns out it is possible by adjusting compilation flags dynamically depending on zlib availability.
Therefore if zlib
OS dependency was not available during data.table installation,
then package will install correctly and only the compression feature in fwrite()
will raise an error of missing ‘zlib’.
data.table::fwrite(iris, "iris.tar.gz")
#Error in data.table::fwrite(iris, "iris.tar.gz") :
# Compression in fwrite uses zlib library. Its header files were not found at
# the time data.table was compiled. To enable fwrite compression, please
# reinstall data.table and study the output for further guidance.
Dependencies are invitations for other people to break your package.
– Josh Ulrich, private communication
tinyverse.org - Lightweight is the right weight
jangorecki/tools4pkgs - Helper functions, extracted from base R branch tools4pkgs, aids in administration tasks around packages development and distribution.