4 Delivery

There is not that much R in this chapter. It is more about tools to ease delivery of R projects.

4.1 Tools

We will create example package to present full reproducible workflow using listed tools that it can be easily adapted to own project.

Populate example project:

mkdir mypkg
cd mypkg
echo '' > DESCRIPTION
echo '' > NAMESPACE
echo '' > R/f.R
echo '' > man/f.Rd
echo 'library(mypkg); x=1; stopifnot(identical(f(x), x))' > tests/test-f.Rd

We have example package, before proceeding lets test it:

R CMD check mypkg

4.1.1 make

Adding Makefile to create aliases for commong actions required in your workflow. It define intermediate layer of abstraction. We add Makefile in project directory.

cd mypkg
touch Makefile
  • make build
  • make test
R CMD check . --ignore-vignettes --no-stop-on-test-error
  • make integration
R CMD check . --as-cran
  • make deploy
  • make all

4.1.2 git

Environments can have their code mirrors in git branches, easing more daily stuff like:

git checkout devel
echo 'g<-function() stop("lets crash")' > R/g.R
echo 'export(g)' >> NAMESPACE
echo 'library(mypkg); stopifnot(inherits(try(g(), silent=TRUE), "try-error"))' > tests/test-g.R
make test

Unit tests passes but package structure tests did not pass, missing g manual entry is a problem.

git checkout master
git checkout test
make test

4.1.3 cron

# list jobs
crontab -l
# edit jobs
crontab -e

Test our devel and test branches every day.

* * * * * git checkout branch devel && make test && make clean && git checkout branch test && make test

4.1.4 tracking status

You should obviously have as much unit tests coverage as possible, it does not exactly mean some particular measure in online service, but to have business requirements satisfied by defined unit tests fully. In is easy to apply test-driven development if required. All those should be included in tests directory, and tested each time package check is run. You can and you should produce logs from your application. It helps in testing as you can verify points in code that was reached by an unit test just by testing log output and string matching to expected value.

echo -e 'f<-function(x=NULL, verbose=FALSE) { if (verbose) cat("doing something 123\n"); x}' > tests/test-verbose.R
echo -e 'stopifnot(grepl("doing something", capture.output(f(verbose=TRUE))))' >> tests/test-verbose.R
make test
  • simulate failure
echo -e 'stop("something goes wrong")' > pkg/tests/fail.R
make test

This kind of errors you track already in automated way inside unit tests. Another set of failures might come on OS level, and for this a little bit of maintenance might be required, ultimately can be also automated. There is set of command like tools which works really perfect.

  • count number of lines in a file
wc -l data.csv
# we expect data will have at least 5 rows
#wc -l data.csv >= 6
  • going through logs
tail log.out
more log.out
less log.out
  • log entries that match pattern
grep ERROR log.out
  • using pipes |

looking up particular manual entry of R package check help

R CMD check --help | grep test-error

tail of lines from log containing WARNING

grep WARNING log.out | tail

System monitoring

htop
free -h
lscpu | less

Working on remote sessions

ssh-copy-id
byobu

4.2 Deployment

Example of vps/cloud deployment, cover resolving common deployment challanges.

4.2.1 Obtain machine

As an example we will use virtual private server (VPS) on Amazon Web Services (AWS) running spot instances.
Spot instances can be terminated at any time when your maximum price per hour has been overbidden by another person, but they are cheap.

4.2.2 Configure machine

4.2.2.1 Update OS software

sudo apt update
sudo apt upgrade
sudo apt dist-upgrade

4.2.2.2 Add sources for recent version of software

sudo ...

4.2.2.3 Install software

sudo apt install gcc-8 r-base

4.2.2.4 Configure software

~/.R/Makevars
r-setup
.Rprofile
.Renviron

This tutorial does not goes into OS users management and security.

4.2.3 Clone dependencies

Clone required dependencies for your package to CRAN-like subset of CRAN to freeze version of all dependencies.

# helper script, can be done by hand or some other helper tool, like miniCRAN package
source("https://raw.githubusercontent.com/Rdatatable/data.table/master/ci.R")
cran <- mirror.packages()
ap <- available.packages(cran)

4.2.4 Clone your code into dev/test/prod paths, to be used with .libPath() when needed

  • it starts allways with dev environment at the beginning doing “proof of contept”
  • when it is mature enough we add test environment which has to be completed set of tests
  • in case of extended integration tests they should be placed in new environment integration where such tests are being run.
  • you might also want to run benchmark tests, you can easily create benchmark environment and upgrade it for benchmarking when required, or on schedule.
  • where integration and benchmark should wait for test environment to succeed.

4.2.5 Before upgrading R or packages you should run tests in new upgraded environment to confirm all tests are passing

  • migration must be smooth process to be easily automated, reducing place for human errors, there exists some helper tools, sometimes big projects, but the core functionality required to manage environments can be easily expressed in base R, avoiding use of “package managers”, stripping extra dependency.

4.2.6 Production environment

Should be stripped to minimum to do exactly what it has to do, so no development tools in production. Exclude any non-essential files, applications, configuration files, logs, user profiles. Ideally production environment should be easily reproducible. Production environment should be upgraded only after pending upgrades were tested.

4.2.7 Automate deployment

You may want to retain ultimate control over deployment to production but automate everything else. In production environment you point to use R package repository path as the one prepared for deployment to production. You can use automation of deploying tags. You can do a lot by using git checkout mybranch

4.2.8 Delivery

Is it important to clealy define what has to be delivered. One of the structures that might be used as delivery unit can be R package repository. How to finally rollup production can be highly customized. There is much you can achieve using bash scripts, grep, cron, git and R. Those are highly standardized approaches, but require different type of maintenance than automated ones on CI/delivery platforms like GitLab Pipelines. Using just linux tool makes whole chain more lightweight. Ideally you should always snapshot the version of software you are dependent on, to avoid breaking changes in future versions of a dependency, that includes your CI pipelines. Because GitLab is open source project it is not a challanging task, but most of CI platforms are not open sourced.

4.3 CI pipelines

Optional, but recommended.

  • register free account at https://gitlab.com
  • create project repository
  • note on GitLab vs GitHub There are two very important feautres that GL > GH:
    • in GitLab private repository comes for free so it is easy to have many workspaces
  • you can self-host it and reduce external dependency if needed.

  • define CI in yaml format: .gitlab-ci.yml

build:
  test:
  deploy:
  pages:
  • push your code to run automated workflow
git checkout gl-ci
git add .gitlab-ci.yml
git commit -m 'defined basic GL CI workflow'
git push origin gl-ci
  • check status on
https://gitlab.com/_user/_project/pipelines

If we are satisfied with change we can pull into head of our working branch.

git checkout master
git merge gl-ci
git push origin master

When need push to different environments

git checkout test
...

Publishing CRAN-like repository