Challenges
Covers broader range of scenarios like data import, export, cleaning, transformation, operating on remote data, package development, productionizing your code.
R language
functions
We already used some built-in functions in section above, now lets create own functions.
control structure
if (1) 1 else 2
exceptions
debugging
Data Input/Output
csv
read.table()
fread("head 10 data.csv")
write.table()
fwrite()
databases
DBI, postgresql
library(RPostgreSQL)
dplyr
equivalent of fetching whole table is copy_to
function.
dplyr::copy_to
Data munging
pivot, unpivot
rename columns
NA count
uniqueN
hist
decode values
temporal analysis
Machine Learning
different algos (xgboost)
interpretability (mli, dalex)
Package development
DESCRIPTION
R
tests
Unit tests.
tests/*.R
test any R script by raising error:
if (!identical(a, b)) stop("this command will fail test script during R CMD check by raising this error. When writing tests like this user should be aware of useful flag `R CMD check pkg --no-stop-on-test-error`")
cat my.pkg.Rcheck/tests/tests.Rout
tests/*.R
vs Rout
We will keep current Routs as expected output for future tests of console output.
cp my.pkg.Rcheck/tests/tests.Rout tests/tests.Rout
cat tests/tests.Rout
helper packages
testthat
(heavy), testit
(lightweight), unitizer
, and many more not mentioned here
NAMESPACE
NAMESPACE
files defines dependencies used by your package, it also define what functions of your package should be exported, thus attached to search path after library(my.pkg)
call. You should export functions for which API is stable. The good practice is that for change in API of exported functions you should have good reasons and a transition period.
Data products
ad-hoc scripts
export to db, csv
web api
view in web browser
shiny
view in web browser
Rserve
connect from new session, link libraries from different languages